Image preprocessing method based on local approximation gradient with application to face recognition

Li, Zhaokui; Wang, Yan; Fan, Chunlong; He, Jinrong

doi:10.1007/s10044-015-0470-6

Image preprocessing method based on local approximation gradient with application to face recognition

Theoretical Advances
Published: 28 March 2015

Volume 20, pages 101–112, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Pattern Analysis and Applications Aims and scope Submit manuscript

Image preprocessing method based on local approximation gradient with application to face recognition

Download PDF

Zhaokui Li¹,
Yan Wang¹,
Chunlong Fan¹ &
…
Jinrong He²

449 Accesses
5 Citations
Explore all metrics

Abstract

In order to obtain more robust face recognition results, the paper proposes an image preprocessing method based on local approximation gradient (LAG). The traditional gradient is only calculated along 0° and 90°; however, there exist many other directional gradients in an image block. To consider more directional gradients, we introduce a novel LAG operator. The LAG operator is actually calculated by integrating more directional gradients. Because of considering more directional gradients, LAG captures more edge information for each pixel of an image and finally generates an LAG image, which achieves a more robust image dissimilarity between images. An LAG image is normalized into an augmented feature vector using the “z-score” method. The dimensionality of the augmented feature vector is reduced by linear discriminant analysis to yield a low-dimensional feature vector. Experimental results show that the proposed method achieves more robust results in comparison with state-of-the-art methods in AR, Extended Yale B and CMU PIE face database.

Directional gradients integration image for illumination insensitive face representation

Article 12 May 2018

A fast face recognition based on image gradient compensation for feature description

Article Open access 25 March 2022

Face recognition based on multi-scale local directional value

Article 20 November 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Face recognition (FR) is the problem of verifying or identifying a face from its image. It has received a great deal of attention from the scientific and industrial communities over the last three decades due to its wide range of applications in entertainment, smart cards, information security, law enforcement, access control and video surveillance [1, 2]. Various methods have been proposed for facial feature extraction and classification, among which the representatives include subspace learning methods (e.g., Eigenface [3], Fisherface [4], Laplacianfaces [5], subspace learning from image gradient orientations (IGO) [6]), kernel-based subspace learning methods [7–10], local binary pattern (LBP) methods [11, 12], Gabor feature-based classification methods [13–15], preprocessing-based approaches [16–21, 30, 31, 33, 34], the recently developed sparse representation-based methods [22–25], and the like. While most of these methods in controlled conditions have already achieved impressive performance over large-scale galleries, there still exist many challenges for FR in uncontrolled environments, such as partial occlusions, varying lighting conditions, expressions, poses, and so on.

This paper focuses mainly on the image preprocessing technology, and the issue of robustness to facial expression varying, lighting variations and partial occlusions. Preprocessing-based approaches seek to reduce the image to a more “canonical” form, and many well-known algorithms have been developed to tackle this problem. Histogram equalization (HE) is one simple example, but purpose-designed methods often exploit the fact that (on the scale of a face) naturally occurring incoming illumination distributions typically have predominantly low spatial frequencies and soft edges so that high-frequency information in the image is predominantly signal (i.e., intrinsic facial appearance). For example, the Multiscale Retinex (MSR) method of Jobson et al. [16] cancels much of the low-frequency information by dividing the image by a smoothed version of itself. Zhang et al. developed a novel technique called Gradientfaces (GF) by extracting the illumination insensitive measure from the gradient domain [17]. Wang et al. [18] use a similar idea (with a different local filter) in the self-quotient image (SQI). Gross and Brajovic (GB) [19] developed an anisotropic smoothing method that relies on the iterative estimation of a blurred version of the original image. More recently, Xiaoyang Tan and Bill Triggs (TT) [20] present a simple and efficient preprocessing chain that eliminates most of the effects of changing illumination while still preserving the essential appearance details that are needed for recognition, and B. Wang et al. [21] present a novel Illumination normalization technique called Weberfaces (WF) by using Weber’s law. In addition, nonnegative matrix factorization (NMF) has become a popular data-representation method and has been widely used in image processing and pattern-recognition problems [30, 31].

In this paper, we propose an image preprocessing method based on local approximation gradient (LAG). The traditional gradient is only calculated along 0° and 90°; however, there exist many other directional gradients in an image block. In order to consider more directional gradients, we introduce a novel LAG operator. The LAG operator is actually calculated by integrating more directional gradients. Generally, the main purpose of gradient calculation is to obtain different edges. Because of considering more directional gradients, LAG captures more edge information for each pixel of an image and finally generates an LAG image, which achieves a more robust image dissimilarity between images. More edge information depicts higher variances of the image. Intuitively, one often pays more attention to the regions of higher variances in a given image as compared with the flat regions. Therefore, the computation of orientation LAG should play an important role in a classification task. Then, a LAG image is normalized into an augmented feature vector using the “z-score” method. The dimensionality of the augmented feature vector is reduced by linear discriminant analysis to yield a low-dimensional feature vector. Experimental results show that LAG achieves more robust results in comparison with state-of-the-art methods in AR, Extended Yale B and CMU PIE face database. The rest of the paper is organized as follows: Section 2 presents our preprocessing method, Sect. 3 reports experimental results, and Sect. 4 concludes the paper.

2 Image preprocessing method based on local approximation gradient

2.1 Local approximation gradient

Generally, gradient operator is a first-order derivative operator. For an input image Γ(x, y), its gradient is a vector:

$$\nabla \Gamma (x,y) = \left( {\begin{array}{*{20}c} {G_{x} } \\ {G_{y} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {\partial \Gamma } & {\partial x} \\ {\partial \Gamma } & {\partial y} \\ \end{array} } \right).$$

(1)

The amplitude of the vector is expressed as:

$${\text{mag}}(\Gamma ) = (G_{x}^{2} + G_{y}^{2} )^{1/2}$$

(2)

The orientation angle of the vector is expressed as:

$$\beta (x,y) = \arctan \left( {\frac{{G_{y} }}{{G_{x} }}} \right)$$

(3)

For a discrete image Γ(x, y), G _x and G _y are usually calculated by different templates. An image template neighborhood T ∊ ℜ^3×3 is shown in Fig. 1. We denote by α the intensity value in the central pixel location and by b _m(m = 1, 2, … 0.8) the mth neighbor of α. The orientation angle of α can be calculated according to $\beta_{\alpha } = \arctan \left( {\frac{{b_{3} - b_{7} }}{{b_{1} - b_{5} }}} \right).$

We can see that the traditional gradient is only calculated along 0° and 90°; however, there exist many other directional gradients in an image block. To consider more directional gradients, we design a novel gradient operator called LAG. The local approximate gradient should also be calculated in a local image block. Figure 2 shows eight different directions in 3 × 3 image template neighborhood.

For an input image Γ(x, y), we can calculate its LAG vector according to formula (4).

$$\nabla \Gamma^{\text{LAG}} (x,y) = \left( \begin{aligned} G_{{0^{{^\circ }} }} \hfill \\ G_{{45^{{^\circ }} }} \hfill \\ G_{{90^{{^\circ }} }} \hfill \\ G_{{135^{{^\circ }} }} \hfill \\ G_{{225^{{^\circ }} }} \hfill \\ G_{{270^{{^\circ }} }} \hfill \\ G_{{315^{{^\circ }} }} \hfill \\ \end{aligned} \right) = \left( \begin{aligned} (b_{1} - \alpha ) \hfill \\ (b_{2} - \alpha ) \hfill \\ (b_{3} - \alpha ) \hfill \\ (b_{4} - \alpha ) \hfill \\ (b_{5} - \alpha ) \hfill \\ (b_{6} - \alpha ) \hfill \\ (b_{7} - \alpha ) \hfill \\ (b_{8} - \alpha ) \hfill \\ \end{aligned} \right)$$

(4)

The amplitude of the vector is expressed as:

$${\text{mag}}(\Gamma^{\text{LAG}} ) = (G_{{0^{^\circ } }}^{2} + G_{{45^{^\circ } }}^{2} + \ldots\,G_{{315^{^\circ } }}^{2} )^{1/2} = \left( {\sum\limits_{m = 1}^{8} {(b_{m} - \alpha )^{2} } } \right)^{1/2}$$

(5)

The orientation angle of the vector is expressed as:

$$\beta^{\text{LAG}} = \arctan \left( {\tfrac{1}{8}\left( {\sum\limits_{m = 1}^{8} {b_{m} - \alpha } } \right)} \right)$$

(6)

Suppose there are N pixels in an image. We treat a pixel α in an image as a center and determine its l neighbor pixels on a square of radius L using the city-block distance (note that for the pixel on the margin of an image, we use the mirror transform to enlarge the image first and then determine its neighbors). These neighbor pixels form a squarely neighbor set. We denote by $\varOmega_{\alpha }^{L} = \left\{ {b_{1} ,b_{2} , \ldots b_{l} } \right\},l = \# (\varOmega_{\alpha }^{L} )$ the Lth neighbor set of pixel α. L is radius, and l is the number of neighbor pixel.

Figure 3 illustrates different neighbor sets for different radius L. Figure 3a shows the first neighbor set Ω ¹_α (including 8 neighbor pixels). Figure 3b shows the second neighbor set Ω ²_α (including 24 neighbor pixels). Figure 3c shows the second neighbor set Ω ³_α (including 48 neighbor pixels).

For the Lth neighbor set of pixel α, we can calculate the corresponding orientation angle according to formula (7)

$$\beta^{\text{LAG}} = \arctan \left( {\frac{1}{l}\left( {\sum\limits_{m = 1}^{l} {b_{m} - \alpha } } \right)} \right),\,\;\varOmega_{\alpha } = \left\{ {b_{1} ,b_{2} , \ldots b_{l} } \right\},\,\;l = \# (\varOmega_{\alpha } )$$

(7)

For each pixel of an image, we consider all neighbor pixels in a local block, and calculate more directional gradients. The LAG is actually calculated by integrating more directional gradients. Generally, the main purpose of gradient calculation is to obtain different edge information. Because of considering more directional gradients, LAG should obtain more edge information. More edge information depicts higher variances of the image. Intuitively, one often pays more attention to the regions of higher variances in a given image compared with the flat regions. So, the computation of orientation LAG should play an important role in a classification task.

In the following, we will compare the difference dissimilarity between the different preprocessed images by different distance metric methods. Let us consider a motivating example in which different dissimilarity measures are applied to the images shown in Fig. 4. Figure 4 shows the different preprocessed images (including Original images, HE images, GF images, GB images, MSR images, TT images, WF images, and LAG images), and the resolution of all images resolution is adjusted to 60 × 60. For GB [19] we set λ = 1. For MSR [16] we set hsiz = [7, 15, 21]. For GF [17] we set σ = 0.75. For TT [20] we set γ = 0.2, σ ₀ = 1, σ ₁ = 2, α = 0.1, and τ = 10. For WF [21] we set σ = 0.75, nn = 9, and α = 2. For the proposed LAG, we set L = 2. Table 1 shows the dissimilarity comparison between neural expression image and the other images with different measures. The other images consist of smile expression image, sunglasses image, scarves image from same subject and neural expression images from different subject. These measure methods consist of Euclidean, Cosine, Correlation and Manhattan [29]. As can be seen in Table 1, seven methods (including None, HE, GF, GB, MSR, TT and WF) associate a smaller distance between the original neural image and a neural image from a different subject. The distance between the original and the same image with occlusion is larger. In contrast, the usage of LAG results in a large distance between the original neural expression image and the image of a different person. Therefore, we can believe that LAG obtains more robust dissimilarity measure.

Table 1 Dissimilarity comparison between neural expression image and the other images with different measures

Full size table

2.2 Image normalization

Assume that we are given a set of n input images {Γ_i}, i = 1, 2, …, n,we can obtain its local approximate gradient image {Ψ_i}, i = 1, 2, …, n by calculating local approximate gradient. Let {y _i} be the p-dimensional column vector obtained by writing {Ψ_i} in lexicographic ordering. All column vectors form a matrix B = [y ₁…y _n] ∊ ℜ^p×n. Each gradient image is normalized using the “z-score” method [32]. We denote by δ _i the normalized vector of y _i. The qth dimension of δ _i can be calculated according to formula (8).

$$\delta_{i}^{\left( q \right)} = \left( {y_{i}^{\left( q \right)} - \mu^{\left( q \right)} } \right)/\sigma^{\left( q \right)} ,\quad i = 1, \ldots ,n,\quad q = 1, \ldots ,p$$

(8)

where y ^(q)_i , δ ^(q)_i are, respectively, the qth dimension of column vector $y_{{_{i} }}^{{}}$ and $\delta_{{_{i} }}^{{}}$.μ ^(q)and σ ^(q) are the corresponding mean and variance respectively, and they can be calculated according to $\mu_{{}}^{(q)} = \frac{1}{n}\sum\nolimits_{i = 1}^{n} {y_{i}^{(q)} }$ and $\sigma_{{}}^{(q)} = \sqrt {\frac{1}{n}\sum\nolimits_{i = 1}^{n} {(y_{i}^{(q)} - \mu_{{}}^{(q)} )^{2} } }$. We can compute all p dimensions of column vector $\delta_{{_{i} }}^{{}}$ according to formula (8). For an input image Γ_i, we can obtain its corresponding normalized vector $\delta_{{_{i} }}^{{}}$.

2.3 Dimensionality reduction

The dimensionality of the normalized vector is reduced by LDA to yield a low-dimensional feature vector. LDA searches for feature vectors in the underlying space that best discriminate among classes. More formally, given a number of independent features relative to which the data is described, LDA creates a linear combination of these which yields the largest mean differences between the desired classes. Mathematically speaking, for all the samples of all classes, we define two measures: (1) one is called within-class scatter matrix, as given by formula (9)

$$S_{w} = \sum\limits_{c = 1}^{C} {\sum\limits_{h = 1}^{{l^{c} }} {(\delta_{h,c} - m^{c} )} } (\delta_{h,c} , - m^{c} )^{T}$$

(9)

where δ _h,c denotes the hth training sample in class c, l ^c is the number of training samples in class c, m ^c is the mean of the training samples in class c, and C is the number of class. (2) The other is called between-class scatter matrix, as given by formula (10)

$$S_{b} = \sum\limits_{c = 1}^{C} {l^{c} } (m^{c} - m)(m^{c} - m)^{T}$$

(10)

where m represents the mean of all classes.

The goal is to maximize the between-class measure while minimizing the within-class measure. The discriminative projection basis W and eigen values Λ are obtained by solving the thin linear discriminant analysis (LDA) eigendecomposition:

$$(S_{w} + \varepsilon I)^{ - 1} S_{b} W = W\Lambda$$

(11)

where ε is a small regularization constant (10⁻³ below) and I is the identity matrix. Thus, the original L-dimensional space is projected onto a final d-dimensional space using LDA.

2.4 Robustness to occlusion

In order to evaluate the robustness to occlusion of our algorithm, we considered a single-sample-per-class experiment using neutral face images taken from the AR database. Our training test consisted of 100 face images of 100 different subjects, taken from session 1. Our testing set consisted of one image per subject taken from session 2. We evaluated the performance of our algorithms for the case of synthetic occlusions. Our algorithms consisted of LAG, and LAG with LDA (LAG-LDA). All test images were artificially occluded by a dinosaur patch of increasing size at random location. Figure 5 shows the corresponding corrupted images. Figure 6 shows the recognition rates of different algorithms without LDA. Figure 7 shows the recognition rates of different algorithms with LDA. As shown in Figs. 6 and 7, our algorithms (including LAG and LAG-LDA) consistently outperform the compared methods. As we may observe, LAG and LAG-LDA obtain more robust performance with a recognition rate over 75 % even when the percentage of occlusion is about 4 %. In addition, LAG achieves also better accuracy on FR than the compared methods without LDA and with LDA.

Figure 8 shows the recognition rates of LAG and LAG-LDA. As shown in Fig. 8, the two results are approximate. Figure 9 shows the running time of all methods (including with LDA and without LDA). It’s worth noting that the methods with LDA achieves achieve faster running time than that without LDA.

3 Experiments

In this section, we evaluate the robustness of our method for the application on FR. Three publicly available face databases, namely, AR [26], Extended Yale B [27], and CMU PIE [28] are used for experimental evaluation. For AR, Extended Yale B and CMU PIE, we resize them to 60 × 60. To verify the effectiveness of the proposed method, in this series of experiments we evaluate the performance of the proposed method and compare it to that of several methods using the LDA and 1-NN classifier. In order to evaluate the performance of our algorithm for the case of synthetic occlusions, a series of similar experiments were also performed. In these experiments, all test images were artificially occluded by a dinosaur patch at random position. We used dinosaur patch of resolution 30 × 30.

3.1 Parameter selection

For GB [19] we set λ = 1 for all experiments. For MSR [16] we set hsiz = [7, 15, 21]. For GF [17] we set σ = 0.75. For TT [20] we set γ = 0.2, σ ₀ = 1, σ ₁ = 2, α = 0.1, and τ = 10. For WF [21] we set σ = 0.75, nn = 9, and α = 2. For the proposed LAG, we set the neighbor radius L = 2 for AR and Extended Yale B. In CMU PIE, the neighbor radius is set to 4. In addition, in 1-NN classifier, the dissimilarity measure based on cosine distance is adopted. In LDA, PCA ratio is set to 1, so all the non-zero eigenvalues will be kept.

3.2 AR database

The AR database consists of more than 4000 frontal view face images of 126 subjects. Each subject has up to 26 images taken in two sessions. The first session contains 13 images, numbered from 1 to 13, including different facial expressions (1–4), illumination changes (5–7), and different occlusions under different illumination changes (8–13). The second session duplicates the first session 2 weeks later. We randomly select a subset with 100 subjects.

Figure 10 shows a sample of images used in our experiments. Figure 10a shows the non-occluded images of session 1. Figure 10b shows the non-occluded images of session 2. Figure 10c shows the corresponding artificial occlusion images of (b). Figure 10d shows the face images occluded by the scarf or sunglasses.

We investigate the robustness of our scheme for the case of facial expression varying, illumination variations, occlusions(sunglasses)-illumination changes, and occlusions(scarves)-illumination changes. Table 2 provides the detailed information of each experiment (referred to as Exp.). The best recognition rates are shown in Table 3, Tables 4 and 5. The dimension that results in the best performance for each method is given in parentheses.

Table 2 Detailed information of all experiments of the AR database

Full size table

Table 3 Best recognition rate comparison on AR face database with different methods (from Exp. 1 to Exp. 4)

Full size table

Table 4 Best recognition rate comparison on AR face database with different methods (from Exp. 5 to Exp. 8 and with LDA)

Full size table

Table 5 Best recognition rate comparison on AR face database with different methods (from Exp. 9 to Exp. 12)

Full size table

From Exp. 1 to Exp. 4, we use images 1–4 of session 1 for training. As can be seen from Table 3, LAG-LDA achieves better recognition rate compared with other methods. LAG-LDA achieves 93.25 % recognition rate for different expressions (Exp. 1). For Exp. 2, LAG-LDA achieves 89.33 % recognition rate for the case of illumination changes. For Exp. 3 and Exp. 4, which are more difficult experiment on the AR database, LAG-LDA achieves 67 % recognition rate for the case of occlusions(sunglasses)-illumination changes, and achieves 67.33 % recognition rate for the case of occlusions(scarves)-illumination changes.

Furthermore, from Exp. 5 to Exp. 8, we use images 8–13 of session 1 for training, and they are also very difficult experiment on the AR database. As can be seen from Table 4, LAG-LDA achieves also better recognition rate compared with other methods. LAG-LDA achieves 75.25 % recognition rate for different expressions (Exp. 5). For Exp. 6, LAG-LDA achieves 90.67 % recognition rate for the case of illumination changes. Due to the use of the occlusion images as the training set, in Exp. 7, LAG-LDA has also achieved better results, and achieves 92 % recognition rate for the case of occlusions(sunglasses)-illumination changes. For Exp. 8, LAG-LDA achieves 73.33 % recognition rate for the case of occlusions(scarves)-illumination changes.

In order to evaluate the performance of our algorithm for the case of synthetic occlusions, we carried out four additional experiments (including Exp. 9, Exp. 10, Exp. 11 and Exp. 12) in AR database. Exp. 9, Exp. 10, Exp. 11 and Exp. 12 are similar experiments to Exp. 1, Exp. 2, Exp. 5 and Exp. 6. The only difference is that all test images were artificially occluded by a dinosaur patch at random position. Table 5 shows the corresponding results. As can be seen from Table 5, the proposed LAG-LDA performs better than other state-of-the-art methods. The performances of these compared methods are not satisfying. It appears that those compared seven methods are sensitive to an artificial occlusion image. In contrast, LAG-LDA is more robust to artificial occlusion image and thus achieves better results.

3.3 Extended Yale B database

The Extended Yale B database contains 2414 front-view face images of 38 individuals. For each individual, about 64 pictures were taken under various laboratory-controlled lighting conditions. Figure 11 shows partial face images used in our experiments. Figure 11a shows the non-occluded images of one subject. Figure 11b shows the corresponding artificial occlusion images of (a).

We carried out the following two experiments in Extended Yale B database. In Exp. 1, a random subset with d(=15, 20, 25, 32) images per individual is taken with labels to form the training set, and the rest of the database is considered to be the testing set. We summarize the best recognition rates of these methods in Table 6. The dimension that results in the best performance for each method is given in parentheses. In Exp. 2, we performed a similar experiment with the Exp. 1, and the only difference is that all test images were artificially occluded by a dinosaur patch at random position. We summarize the best recognition rates of these methods in Table 7. The dimension that results in the best performance for each method is given in parentheses.

Table 6 Best recognition rate comparison on Extended Yale B face database with different methods

Full size table

Table 7 Best recognition rate comparison on Extended Yale B face database with different methods (dinosaur occlusion)

Full size table

As can be seen from Table 6, LAG-LDA obtains better recognition rate compared to the other algorithms. In all experiments, LAG-LDA has more than 99 % recognition rate. For each method, we calculate the standard deviation for all the recognition rates. As can be seen from Table 6, LAG-LDA obtains smaller standard deviations compared to the other methods, so the performance of our algorithm is more stable. Therefore, for illumination changes, our algorithm is more robust.

As can be seen from Table 7, we can see that the performances of all methods degrade compared with Table 6. However, LAG-LDA still obtains a better recognition rate as compared to the other algorithms. In all experiments, LAG-LDA has also more than 99 % recognition rate. For each method, we also calculate the standard deviation for all the recognition rates. As can be seen from Table 7, LAG-LDA still obtains a smaller standard deviation as compared to the other methods, so the performance of our algorithm is more stable. Therefore, we can believe that our algorithm is more robust to illumination changes with different occlusions.

3.4 CMU PIE database

The CMU PIE database consists of more than 41,000 face images of 68 subjects. The database contains faces under varying pose, illumination, and expression. We used the five near frontal poses (C05, C07, C09, C27, C29) and a total of 170 images for each subject. Figure 12 shows partial face images used in our experiments. Figure 12a shows the non-occluded images of one subject. Figure 12b shows the corresponding artificial occlusion images of (a).

We carried out the following two experiments in CMU PIE database. In Exp. 1, a random subset with d(=10,15,20 and 25) images per individual is taken with labels to form the training set, and the rest of the database is considered to be the testing set. We summarize the best recognition rates of these methods in Table 8. The dimension that results in the best performance for each method is given in parentheses. In Exp. 2, we performed a similar experiment with the Exp. 1, and the only difference is that all test images were artificially occluded by a dinosaur patch at random position. We summarize the best recognition rates of these methods in Table 9. The dimension that results in the best performance for each method is given in parentheses.

Table 8 Best recognition rate comparison on Extended Yale B face database with different methods

Full size table

Table 9 Best recognition rate comparison on Extended Yale B face Database with different methods (dinosaur occlusion)

Full size table

As can be seen from Table 8, we can see that our method LAG-LDA achieves the best performance among all methods, irrespective of the variations of training sample size. In all experiments, LAG-LDA has more than 92 % recognition rate. For each method, we calculate the standard deviation for all the recognition rates. As can be seen from Table 8, our method obtains the second smallest standard deviation as compared to the other methods; however, we can see that our method outperforms the first smallest standard deviation method (HE-LDA) in terms of recognition accuracy for approximately up to 84, 85, 86 and 86 % in different training sample sizes (10, 15, 20 and 25). In addition, LAG-LDA outperforms the second best method (MSR-LDA) in terms of recognition accuracy for approximately up to 6, 4, 5 and 5 % in different training sample sizes (10, 15, 20 and 25). Therefore, we can believe that our algorithm is more robust to pose changes.

As can be seen from Table 9, we can see that the performances of all methods degrade compared with Table 8. However, we can see that our method LAG-LDA still achieves the best performance among all methods, irrespective of the variations of training sample size. In all experiments, LAG-LDA has more than 84 % recognition rate. For each method, we also calculate the standard deviation for all the recognition rates. As can be seen from Table 9, our method does not obtain the smallest standard deviation as compared to the other methods; however, we can see that our method outperforms the first smallest standard deviation method (HE-LDA) in terms of recognition accuracy for approximately up to 76, 79, 79 and 80 % in different training sample sizes (10, 15, 20 and 25). In addition, LAG-LDA outperforms the second best method (GF-LDA) in terms of recognition accuracy for approximately up to 21, 21, 22 and 20 % in different training sample sizes (10, 15, 20 and 25). Therefore, we can believe that our algorithm is more robust to pose changes with different occlusions.

As can be seen from Tables 8 and 9, we can observe that LAG-LDA obtains better results with or without occlusion; however, the other methods are very sensitive to artificial occlusion. Therefore, LAG-LDA is a more stable method.

3.5 Parameter analysis

In this section, we will discuss the influence of parameter setting of the neighbor radius L. The performances of LAG-LDA with different neighbor radius are evaluated on the face image databases mentioned above. The first session of each individual is used for the training and the second session for tests in the experiment on the AR database. The random 15 images per class of Extended Yale B database are chosen for training, and the rest for testing. In addition, the random 15 images per class are selected for training, and the remaining images for test on the CMU PIE database.

The recognition rates of LAG-LDA with respect to different neighbor radius on the three databases are shown in Table 10. As can be seen from Table 10, LAG-LDA achieves better result when L is set to 2 in AR and Extended Yale B. It is worth nothing that LAG-LDA achieves better result when L is set to 4 in CMU PIE. Therefore, we generally choose the neighbor radius L = 2. For face database with pose changes, we intuitively believe that the larger neighbor radius can deal with larger local image block, and can better adapt to pose changes. In this paper, we choose the neighbor radius L = 4 for CMU PIE.

Table 10 The performance of LAG-LDA with different neighbor radius

Full size table

4 Conclusions

In this paper, we propose an image preprocessing method based on LAG. The LAG operator is actually calculated by integrating more directional gradients. Because of considering more directional gradients, LAG captures more edge information for each pixel of an image and finally generates an LAG image, which achieves a more robust image dissimilarity between images. Then, the LAG image is normalized into a normalized feature vector using the “z-score” method. The dimensionality of the normalized feature vector is reduced by linear discriminant analysis to yield a low-dimensional feature vector. Our experiments show that the proposed method is robust to different facial expressions, illumination variations and occlusions changes (including random occlusion position and different occlusion sizes), and achieves better recognition rate.

References

Zhao W, Chellappa R, Phillips PJ, Rosenfeld A (2003) Face recognition: a literature survey. ACM Comput Surv 34(4):399–485
Article Google Scholar
Poh N, Chan CH, Kittler J et al (2010) An evaluation of video-to-video face verification. IEEE Trans Inf Forensics Secur 5(4):781–801
Turk M, Pentland A (1991) Eigenfaces for recognition. J Cogn Neurosci 3(1):71–86
Article Google Scholar
Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
He X, Yan S, Hu Y, Niyogi P, Zhang HJ (2005) Face recognition using Laplacianfaces. IEEE Trans Pattern Anal Mach Intell 27(3):328–340
Article Google Scholar
Goudelis G, Zafeiriou S, Pantic M (2012) Subspace learning from image gradient orientations. IEEE Trans Pattern Anal Mach Intell 34(12):2454–2466
Article Google Scholar
Goudelis G, Zafeiriou S, Tefas A, Pitas I (2007) Class-specific kernel-discriminant analysis for face verification. IEEE Trans Inf Forensics Secur 2(3):570–587
Article Google Scholar
Yang J, Frangi AF, Yang J, Zhang D, Jin Z (2005) KPCA plus LDA: a complete kernel fisher discriminant framework for feature extraction and recognition. IEEE Trans Pattern Anal Mach Intell 27(2):230–244
Article Google Scholar
Baudat G, Anouar F (2000) Generalized discriminant analysis using a kernel approach. Neural Comput 12:2385–2404
Article Google Scholar
Cevikalp H, Neamtu M, Wilkes M (2006) Discriminative common vector method with kernels. IEEE Trans Neural Netw 17(6):1550–1565
Article Google Scholar
Timo A, Abdenour H, Matti P (2004) Face recognition with Local Binary Patterns. In: Proceedings of the European Conference Computer Vision, Springer, Berlin, pp 269–481
Ahonen T, Hadid A, Pietikainen M (2006) Face description with local binary patterns: application to face recognition. IEEE Trans Pattern Anal Mach Intell 28(12):2037–2041
Article MATH Google Scholar
Chengjun L (2004) Gabor-based kernel PCA with fractional power polynomial models for face recognition. IEEE Trans Pattern Anal Mach Intell 26(5):572–581
Article Google Scholar
Chengjun L, Wechsler H (2002) Gabor feature based classification using the enhanced Fisher linear discriminant model for face recognition. IEEE Trans Image Process 11(4):467–476
Article Google Scholar
Chengjun L, Wechsler H (2003) Independent component analysis of Gabor features for face recognition. IEEE Trans Neural Netw 14(4):919–928
Article Google Scholar
Jobson D, Rahman Z, Woodell G (1997) A multiscale retinex for bridging the gap between color images and the human observation of scenes. IEEE Trans Image Process 6(7):965–976
Article Google Scholar
Zhang T, Tang YY, Fang B, Shang Z, Liu X (2009) Face recognition under varying illumination using gradientfaces. IEEE Trans Image Process Corresp 18(11):2599–2606
Article MathSciNet Google Scholar
Wang H, Li S, Wang Y (2004) Face recognition under varying lighting conditions using self quotient image. In: Proc. IEEE Int. Conf. Autom. Face Gesture Recognition, pp 819–824
Gross R, Brajovic V (2003) An image preprocessing algorithm for illumination invariant face recognition. In Proceedings of the AVBPA, pp 10–18
Tan Xiaoyang, Triggs Bill (2010) Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Trans Image Process 19(6):1635–1649
Article MathSciNet Google Scholar
Wang B, Li W, Yang W, Liao Q (2011) Illumination normalization based on Weber’s law with application to face recognition. IEEE Signal Process Lett 18(8):462–465
Article Google Scholar
Wright J, Yang A, Ganesh A, Sastry S, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227
Article Google Scholar
Yang M, Zhang L, Yang J Zhang D (2011) Robust sparse coding for face recognition. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition
Elhamifar E, Vidal R (2011) Robust classification suing structured sparse representation. In: Proceedings of the IEEE Int’l ConfComputer Vision and Pattern Recognition
Yang M, Zhang L, Yang J, Zhang D (2013) Regularized robust coding for face recognition. IEEE Trans Image Process 22(5):1753–1766
Article MathSciNet Google Scholar
Martinez AM, Benavente R (1998) The AR face database. CVC technical report
Lee KC, Ho J, Kriegman DJ (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell 27(5):684–698
Article Google Scholar
Sim T, Baker S, Bsat M (2003) The CMU pose, illumination, and expression database[J]. IEEE Trans Pattern Anal Mach Intell 25(12):1615–1618
Article Google Scholar
Guan N, Tao D, Luo Z et al (2012) MahNMF: Manhattan non-negative matrix factorization[J]. arXiv preprint arXiv:1207:3438
Zafeiriou S, Tefas A, Buciu I, Pitas I (2006) Exploiting discriminant information in nonnegative matrix factorization with application to frontal face verification. IEEE Trans Neural Networks 17(3):683–695
Article Google Scholar
Guan Naiyang, Tao Dacheng, Luo Zhigang, Yuan Bo (2011) Manifold regularized discriminative non-negative matrix factorization with fast gradient descent. IEEE Trans Image Process 20(7):2030–2048
Article MathSciNet Google Scholar
Jain A, Nandakumar K, Ross A (2005) Score normalization in multimodal biometric systems. Pattern Recogn 38(12):2270–2285
Article Google Scholar
Jun Yu, Yong Rui, Yuan Yan Tang, Dacheng Tao (2014) High-order distance-based multiview stochastic learning in image classification. IEEE Trans Cybern 44(12):2431–2442
Jun Yu, Tao Dapeng, Li Jonathan, Cheng Jun (2014) Semantic preserving distance metric learning and applications. Inf Sci 281(10):674–686
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer, Shenyang Aerospace University, Daoyi South 37th Street, Daoyi economic development zone, Shenyang, 110136, Liaoning, China
Zhaokui Li, Yan Wang & Chunlong Fan
College of Information Engineering, Northwest A&F University, Yangling, 712100, Shanxi, China
Jinrong He

Authors

Zhaokui Li
View author publications
You can also search for this author in PubMed Google Scholar
Yan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chunlong Fan
View author publications
You can also search for this author in PubMed Google Scholar
Jinrong He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhaokui Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Z., Wang, Y., Fan, C. et al. Image preprocessing method based on local approximation gradient with application to face recognition. Pattern Anal Applic 20, 101–112 (2017). https://doi.org/10.1007/s10044-015-0470-6

Download citation

Received: 30 March 2014
Accepted: 05 March 2015
Published: 28 March 2015
Issue Date: February 2017
DOI: https://doi.org/10.1007/s10044-015-0470-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Image preprocessing method based on local approximation gradient with application to face recognition

Abstract

Similar content being viewed by others

Directional gradients integration image for illumination insensitive face representation

A fast face recognition based on image gradient compensation for feature description

Face recognition based on multi-scale local directional value

1 Introduction