1 Introduction

In all biometric recognition technologies, face recognition has become one of the hot areas of pattern recognition and computer visual for its huge application potential in fields of public safety, intelligent access control, criminal investigation and others due to advantages of strong adaptability, high security as well as non-contact intelligent interaction. Traditional face recognition process generally includes four steps as face detection, image preprocessing, feature extraction and classification, where the core is feature extraction and classification. Under ideal conditions, namely reference image has same acquisition condition as image to be recognized, the object is matched and database size is moderate, traditional face recognition methods can achieve satisfactory result. In practice, the situation is complex. For example, the illumination change, face signature change, facial expression change as well as aging and occlusion problems all put forward new demands and challenges on traditional methods. Initially, the face recognition was studied as a general pattern recognition problem and mainly implemented with methods based on face geometry structural feature. There was not very significant research results, nor been applied practically. Afterwards, many typical face recognition algorithms were proposed in the climax phase [14, 2123, 28]. The classic Eigenface [22] and Fisherface [21] were put forward in this period. The conclusion that template matching method superior to geometric feature method and role of Eigenface ended researches on face recognition based on structural feature. It also promoted development of appearance-based linear subspace modeling [16, 24] and face recognition based on statistical pattern recognition technology largely [13, 19]. Overall, the face recognition algorithms proposed in this period has good performance under ideal conditions, which have been applied for commercial face recognition systems.

With the development of face recognition technologies, the recognition system can achieve satisfactory performance under control conditions. However, the mainstream technology could not deal with illumination and pose change, expression change, aging problem, occlusion problem, low-quality image, resulting in significantly dropping of recognition rate. Therefore, researchers began to study on face recognition under uncontrolled environment. The illumination and attitude change are factors been considered most in the face recognition problem. The illumination change mainly expresses as illumination intensity change and angle change. The gray distribution difference of obtained two images of same face under different light condition exceeds gray distribution difference of different face under same light condition, thus resulting in decreased recognition rate. Similarly, as to multi-attitude face recognition, when the face attitude take place changes within or outside image plane, the image difference of same person under different attitude will be greater than that of different person under same attitude, resulting in recognition errors.

As to robust illumination and attitude face recognition methods, the model has expanded from 2D to 3D. The existing to couple with illumination can be divided into two classes as illumination compensation based on transformation and that based on illumination sample synthesis. The illumination compensation method based on transformation is an early method, mainly including histogram equalization method [29], nonlinear transformation method [5] and homomorphic filtering [7]. These methods cannot arrive at ideal face image processing result under extreme illumination condition. The method based on illumination sample synthesis mainly estimate illumination condition of face illumination image to be recognized to simulate standard illumination condition, thus normalizing image into standard illumination condition before image recognition. The existing illumination sample synthesis methods can be divided into four classes, namely methods based on invariant feature, illumination change model, face image normalization and shape from shaping (SFS) [25]. The former method is represented by quotient image (QI) [20], light cone method [27] and spherical harmonics method [1]. The latter three methods all need to estimate 3D information of face from 2D image.

The multi-attitude face recognition methods based on 2D image mainly include method based on multiple view [26], attitude invariant feature [12, 16] and deformation model [3, 11, 15]. The method based on multiple views needs multiple images from different perspective of each person. It can be divided into different subsets in according to attitude and then train each attitude subnet. In case of recognition, firstly estimate which typical attitude is that of face to be recognized closer to and then recognize in the corresponding range. The attitude invariant feature method tries to find a feature independent from attitude to solve multi-attitude face recognition problem. Such method has complex computation, the attitude change range can be processed by which is also small. The method based on deformation model deal with multi-attitude problem by synthesizing new visual image. It arrives at images of each face attitude with different methods from a known face image. The 3D model method generally establishes a shape model to estimate shape and texture parameter, or building 3D face model of face to be recognized [2, 6, 8]. The recognition was carried on estimated parameters or generated positive face image. Although the 3D model method can achieve satisfactory recognition rate, which needs to pay higher computation cost.

The feature extraction has been an important step in the face recognition, which directly affects the recognition result. After decades of development, feature extraction methods are no longer limited to the classic Eigenface and Fisherface. Various different feature extraction methods were constantly applied to face recognition. In recent years, the feature extraction methods based on manifold were constantly proposed, which complete feature extraction by mining nonlinear manifold structure within data. However, the feature projection obtained in this way all addressed to training data. To new test sample, it cannot find sample in the embedded space. It is the so-called out-of-sample problem. To address this problem, some linear approximation methods were brought out. The local preserving projection (LPP) is one of them [10]. Nonetheless, there are some other problems in the manifold learning methods, such as neighborhood size selection and optimal selection of other parameters still plague researchers. Inspired by sparse representation, the sparse preserving projection (SPP) method was proposed [18]. It automatically selects neighborhood number using sparse representation. Meanwhile, the weight in neighborhood graph is no longer computed by given formula, but obtained from sparse representation process. It simultaneously carries out neighbor selection the weight determination, thus solving the above problem well. Although the sparse representation process keeps SPP certain discrimination feature, it is also an unsupervised feature extraction method, not involving class information. As far as face recognition as concerned, the discrimination information is helpful for improving recognition rate, which plays important role. Therefore, the paper tries to improve SPP inspired by LPP. The discrimination information is introduced to arrive at a face feature recognition method with stronger discrimination. At the same time, the statistical uncorrelated constraints are added to decrease redundant among eigenvectors. The UDSPP is proposed. The specific arrangement of the paper is as follows. The related concepts of sparse preserving projection and sparse reconstruction are introduced in Section 2. The UDSPP method is proposed in Section 3. In Section 4, the recognition experiment on the proposed algorithm is carried out, the performance is also analyzed. Section 5 concludes our work.

2 Sparse preserving projection and sparse reconstruction

2.1 Sparse representation

In the traditional signal representation, the signal can be decomposed on a set of complete orthogonal basis. The decomposition process is always accompanied by complex computation. Based on wavelet analysis, Mallat et al. proposed sparse representation idea. It utilizes sparsity of signal for modeling. A small number of non-zero coefficients are used to reflect internal structure and nature properties of signal so that the signal expression be simple. Thus, the sparse representation opened up a new research of signal representation.

In general, the signal can be expressed as the linear combination of a vector basis {t i } M i = 1 . It means if the signal f ∈  N is given, it can be linear represented by {t i } M i = 1 , t i  ∈  N.

$$ f=T\alpha =\left[\begin{array}{llll}{t}_1\hfill & {t}_2\hfill & \cdots \hfill & {t}_N\hfill \end{array}\right]\left[\begin{array}{c}\hfill {\alpha}_1\hfill \\ {}\hfill {\alpha}_2\hfill \\ {}\hfill \vdots \hfill \\ {}\hfill {\alpha}_N\hfill \end{array}\right], $$
(1)

where α is the express factor. In practice, commonly used complete orthogonal basis includes Fourier basis, wavelet basis and cosine basis. At this moment, M = N.

In the sparse representation, complete orthogonal is replaced by redundant basis, namely M > > N. At this time, t i , i = 1, ⋯, M is no longer linearly independent, so D = {t i } M i = 1 , M > > N is called dictionary. When the redundant dictionary is used to express signal, the ultra-complete representation of signal f is as:

$$ f=D\alpha . $$
(2)

At given f and D, the above formula is an underdetermined problem. It has an infinite number of solution α. Because of this underdeterminedness, it is possible of signal sparse representation.

The target of sparse representation is to find a most sparse solution from all possible solutions of (2), namely the one contains minimum non-zero elements. The non-zero element number in vector α can be expressed by ‖α0, so the sparse representation model can be expressed as follows:

$$ \arg \underset{\alpha }{ \min }{\left\Vert \alpha \right\Vert}_0s.t.f=D\alpha . $$
(3)

Since solving (3) is a NP-hard problem, the minimization problem based on 1 norm is usually used for approximation and the optimal model can be obtained:

$$ \arg \underset{\alpha }{ \min }{\left\Vert \alpha \right\Vert}_1s.t.f=D\alpha . $$
(4)

After decades of development, the compressed sensing theory based on sparse representation theory was brought out. It breaks through the limit of Nyquist sampling theorem, which attracted interesting of many scholars [4]. Meanwhile, with constantly improvement of sparse representation theorem and various optimizing algorithms, the application of sparse representation has involved in aspects of image processing, which has become one of hot field of image processing.

2.2 Sparse matrix reconstruction

Feature extraction use linear or non-linear projection to project high-dimensional space where sample located to corresponding low-dimensional feature space. Except for reducing dimension to decrease computation complexity in case of recognition, the feature extraction is also helpful to reduce noise so that the extracted feature has better separability.

The general feature extraction method can be expressed as follows. Given as data sample set {x i } n i = 1 , where x i  ∈  m is an m-dimensional column vector and n is sample number. Find an appropriate linear transformation matrix W ∈  m ((d < < m)) to project original sample point to a new low-dimensional feature space, namely y i  = W T x i . The y i is the obtained low-dimensional feature.

Sparse preserving projection is a feature extraction method adaptive to face recognition proposed in recent years. It find projection matrix by preserving sparse reconstruction relationship before and after feature extraction. Compared with methods to determine neighborhood relationship with k neighborhood as LPP, the determination of neighborhood relationship based on sparse representation is more reasonable. At the same time, the neighborhood selection and weight matrix determination of SPP can be carried out simultaneously.

Given a data sample set {x i } n i = 1 , where x i  ∈  m, X = [x 1, x 2, ⋯, x n ] ∈  m × n is the matrix containing all samples as column vectors. The target of sparse representation is to express an unknown sample x with as few elements in X as possible. In case of solving weight matrix in SPP, its model can be expressed as follows:

$$ \begin{array}{r}\hfill { \min}_{s_i}{\left\Vert {s}_i\right\Vert}_1\\ {}\hfill s.t.{x}_i=X{s}_i\\ {}\hfill 1={1}^T{s}_i\end{array}, $$
(5)

where s i  = [s i,1, s i,2, ⋯, s i,i − 1, 0, s i,i + 1, ⋯, s i,n ]T is an n-dimensional vector. The i-th element is 0. In other words, the x i is represented linearly by elements except x i in X and the element s i,j , j ≠ i indicates contribution of x j on reconstruction of x i . After all s i , i = 1, ⋯, n been obtained, the sparse reconstruction matrix is S = [s 1, ⋯, s n ]T.

The objective function of SPP is defined as [18]:

$$ { \min}_w{\displaystyle \sum_{i=1}^n{\left\Vert {w}^T{x}_i-{w}^TX{s}_i\right\Vert}^2}. $$
(6)

By algebraic transformation, the above objective function can be converted into following form:

$$ { \min}_w{\displaystyle \sum_{i=1}^n{\left\Vert {w}^T{x}_i-{w}^TX{s}_i\right\Vert}^2}={w}^TX\left(I-S-{S}^T+{S}^TS\right){X}^Tw. $$
(7)

Set S α  = I − S − S T + S T S and introduce constraint w T XX T w = 1 to avoid degeneration solution, the SPP model can be transformed into following optimization problem:

$$ { \min}_w\frac{w^TX{S}_{\alpha }{X}^Tw}{w^TX{X}^Tw}. $$
(8)

Its solution w is the eigenvector corresponding to minimum eigenvalue of following generalized eigenvalue:

$$ X{S}_{\alpha }{X}^Tw=\lambda X{X}^Tw. $$
(9)

Set w 1, w 2, ⋯, w d as eigenvector corresponding to d minimum eigenvalues, namely λ 1 ≤ λ 2 ≤ ⋯ ≤ λ d , so the obtained projective matrix is W = [w 1, w 2, ⋯, w d ].

3 UDSPP method

Compared with LPP, the SPP uses sparse representation to construct neighborhood graph to avoid selecting neighbor range k. On the other hand, k neighbor selects neighbors with Euclidean distance as a measure, which is sensitive to data noise. As long as there is noise in the data, it will cause chaos in the neighbor relation graph. Thanks to sparse representation, SPP can solve the problem effectively. So it is more practical. As to face recognition, the supervised feature extraction method can mine discrimination features in the data better, thus improving recognition performance. In order to enhance discrimination of SPP, the paper tries to improve SPP by adding discrimination information so that the obtained new feature extraction method UDSPP can better adapt to face recognition. The introduction of statistical uncorrelated constraints aims at reducing redundant among eigenvectors and improving effectiveness of feature extraction.

3.1 SPP

Firstly, the objective function of discrimination sparse preserving projection can be given as:

$$ \min \frac{{\displaystyle {\sum}_{c=1}^C{\displaystyle {\sum}_{i=1}^{n_c}{\left({y}_i^c-{\displaystyle {\sum}_{j=1}^{n_c}{s}_{ij}^c}{y}_i^c\right)}^2}}}{{\displaystyle {\sum}_{i,j=1}^C{\left({m}_i-{m}_j\right)}^2}}, $$
(10)

where C is class number of samples, n c is sample number in the c-th class, y c i is the projection of the i-th sample in the c-th class to feature space. The m i and m j are mean vector of the i-th class sample and the j-th class sample in the feature space respectively, namely \( {m}_i=\frac{1}{n_i}{\displaystyle {\sum}_{k=1}^{n_i}{y}_k^i} \) and \( {m}_j=\frac{1}{n_j}{\displaystyle {\sum}_{k=1}^{n_j}{y}_k^j} \). We hope to maximize distance between classes while keeping sparse reconstruction relationship in intra-class of optimal projection of objective function, so as to enhance discrimination.

With simple algebraic transformation, the molecule in objective function can be transformed into the following form:

$$ \begin{array}{l}{\displaystyle {\sum}_{c=1}^C{\displaystyle {\sum}_{i=1}^{n_c}\left({y}_i^c-{\displaystyle {\sum}_{j=1}^{n_c}{s}_{ij}^c}{y_i^c}^2\right)}}={\displaystyle {\sum}_{c=1}^C{\displaystyle {\sum}_{i=1}^{n_c}{\left({w}^T{x}_i^c-{\displaystyle {\sum}_{j=1}^{n_c}{s}_{ij}^c}{w}^T{x}_i^c\right)}^2}}\\ {}={\displaystyle {\sum}_{c=1}^C{\displaystyle {\sum}_{i=1}^{n_c}{\left({w}^T{x}_i^c-{w}^T{X}_c{s}_i^c\right)}^2}}={\displaystyle {\sum}_{c=1}^C{w}^T\left({I}_c-{S}_c-{S}_c^T+{S}_c^T{S}_c\right)}w,\\ {}={w}^T\left(I-{S}_{\beta }-{S}_{\beta}^T+{S}_{\beta}^T{S}_{\beta}\right)w\end{array} $$
(11)

where \( I=\left[\begin{array}{ccc}\hfill {I}_1\hfill & \hfill 0\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill \ddots \hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill 0\hfill & \hfill {I}_c\hfill \end{array}\right] \), I c is n c  × n c unit matrix, \( {S}_{\beta }=\left[\begin{array}{ccc}\hfill {S}_1\hfill & \hfill 0\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill \ddots \hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill 0\hfill & \hfill {S}_C\hfill \end{array}\right] \), S c is the coefficient reconstruction matrix within the c-th class, c = 1, ⋯, C.

The denominator of objective function can be converted into:

$$ \begin{array}{l}{\displaystyle \sum_{i,j=1}^C{\left({m}_i-{m}_j\right)}^2}={\displaystyle \sum_{i,j=1}^C{\left(\frac{1}{n_i}{\displaystyle \sum_{k=1}^{n_i}{y}_k^i}-\frac{1}{n_j}{\displaystyle \sum_{k=1}^{n_j}{y}_k^j}\right)}^2}\\ {}={\displaystyle \sum_{i,j=1}^C\left[{w}^T\left({f}_i-{f}_j\right){\left({f}_i-{f}_j\right)}^Tw\right]}=w{}^TS_bw,\end{array} $$
(12)

where \( {f}_i=\frac{1}{n_i}{\displaystyle \sum_{k=1}^{n_i}{x}_k^i} \) is the mean vector of the i-th class sample in the original space and S b is the dispersion matrix of inter-class.

Set S γ  = I − S β  − S T β  + S T β S β , the objective function is converted into the following form:

$$ { \min}_w\frac{w^TX{S}_{\gamma }{X}^Tw}{w^T{S}_bw}. $$
(13)

It should be noted that when sample of each person in the training database decreases, each image cannot be linearly expressed only using samples of same person. It will lead to inaccurate sparse reconstruction relationship. The reconstructed weight matrix cannot reflect relationship among samples well. It was known that the sparse representation has certain discrimination itself [8]. So we try to replace sparse reconstruction relationship in intra-class with global reconstruction relationship in case of less training samples of each person. Under such status, the molecular of sparse preserving projection discrimination is that of SPP, namely:

$$ { \min}_w\frac{w^TX{S}_{\alpha }{X}^Tw}{w^T{S}_bw}. $$
(14)

In this way, each sample can be accurately expressed. Meanwhile, the samples within same class have greater contribution. It means the relationship among samples of same class can also be well represented. The experiment result also shows the replacement is reasonable.

3.2 UDSPP model

In order to enable the extracted feature with statistical uncorrelated to decrease redundancy among eigenvectors, add a statistical uncorrelated constraint into the objective function, namely w T S t w = 1. Where, S t is the total dispersion matrix. Thus the proposed UDSPP model is as follows:

$$ \begin{array}{l}{ \min}_{w^T{S}_tw=1}\frac{w^TX{S}_{\gamma }{X}^Tw}{w^T{S}_bw}={ \min}_{w^T{S}_tw=1}\frac{w^TX{S}_{\gamma }{X}^Tw}{w^T\left({S}_b-{S}_w\right)w}\\ {}={ \min}_{w^T{S}_tw=1}\frac{w^TX{S}_{\gamma }{X}^Tw}{1-{w}^T{S}_ww},\end{array} $$
(15)

where S w is the intra-class dispersion matrix. According to [6], S t can be written as XGX T and S w written as XMX T, where G = I − (1/n)ee T, e = (1, ⋯, 1)T and M = I − E. If x i and x j belong to the c-class, then E ij  = 1/n c , otherwise E ij  = 0. The constraint optimization problem can be transformed into the form as follows:

$$ { \min}_{w^T{S}_tw=1}{w}^TX\left({S}_{\gamma }+M\right)w. $$
(16)

Its solution w is the eigenvector corresponding to the smallest eigenvalue of the following generalized eigenvalue problem:

$$ X\left({S}_{\gamma }+M\right){X}^Tw=\lambda XG{X}^Tw. $$
(17)

Set w 1, w 2, ⋯, w d as eigenvector corresponding to the smallest d eigenvalues, namely λ 1 ≤ λ 2 ≤ ⋯ ≤ λ d . The obtained projection matrix is W = [w 1, w 2, ⋯, w d ]. Thus, we give the UDSPP with statistical uncorrelated feature.

4 Recognition experiment and performance analysis

4.1 Face recognition database

Before recognition experiment, we firstly introduce the face database to be used, sample images in which are also provided. The Extended Yale B database [12, 17] contains 2414 frontal face images of 38 individuals. These images were collected under controlled laboratory environment, including different lighting conditions. After cutting and standardization, image size is 192 × 168. Figure 1 shows part of images under modest and brighter illumination in the database.

Fig. 1
figure 1

Part of images from extended Yale B database

The images in FERET database were collected in a semi-controlled environment [9]. The complete database contains 1564 subsets and 14136 images. Figure 2 shows part of images from FERET database.

Fig. 2
figure 2

Images from FERET

4.2 Experiment results on different face database

4.2.1 Experiment result on extended Yale B

Firstly, the experiment was carried out on Extended Yale B database. The proposed UDSPP method was compared with SPP and LDA. We randomly selected 10 images for training, and then randomly selected 10 images for test. The training images are not overlap with test images. The experiment was repeated 20 times. Table 1 shows the maximum average recognition rate and corresponding dimension using the nearest neighbor classifier with three methods for feature extraction. Figure 3 shows average recognition rate of each dimension.

Table 1 The highest average recognition rate and corresponding dimension (%)
Fig. 3
figure 3

Average recognition rate comparison under different dimension

As it can be seen from Table 1 and Fig. 3, the recognition rate is significantly improved compared with SPP as UDSPP added discrimination. Compared with traditional LDA method, as the Extended Yale B is database with extreme illumination changes, UDSPP and SPP have higher recognition rate than LDA as both methods reflect sparse reconstruction relationship of inter-class samples because of intra-class sparse preserving structure of UDSPP and global sparse preserving structure of SPP.

4.2.2 Experiment result on FERET database

The comparison on FERET database was taken the following form. Select 4 images or 5 images as training images each time and the remaining for test. Table 2 shows the highest average recognition rate and corresponding dimension.

Table 2 The highest average recognition rate and corresponding dimension

It can be seen from Table 2 that as to general positive face database, the recognition rate is less than that of LDA when there is no significant light influence in the image. It is because LDA has a strong discrimination. When there is no affect from computation distance, it is more conducive to face recognition. However, as UDSPP combines advantages of SPP and LDA, its recognition rate is higher than above two methods. It can also been known from the table that the recognition rate of UDSPP significantly decreases in case of 4 training samples of each individual. It is analyzed that there is too less training samples, so the intra-class reconstruction matrix cannot reflect intra-class sample relationship well. In accordance with above analysis, the intra-class sparse reconstruction relationship will be reserved to replace it with global sparse reconstruction relationship. The average recognition rate corresponding to each dimension with four methods, including G-UDSPP that represented with global sparse representation is provided in Figs. 4 and 5.

Fig. 4
figure 4

Average recognition rate with 4 training samples

Fig. 5
figure 5

Average recognition rate with 5 training samples

4.2.3 Performance comparison with DSPP

Compare UDSPP with another existing discrimination sparse preserving projection (DSPP) [32]. The DSPP directly combine sparse preserving projection with linear discrimination analysis to arrive at the objective function as follows:

$$ \begin{array}{l} \min {w}^T\left(X{S}_{\alpha }{X}^T-\gamma \left({S}_b-{S}_w\right)\right)w\\ {}s.t.{w}^TX{X}^Tw=1.\end{array} $$
(18)

The comparison results are shown in Figs. 6 and 7. The Extended Yale B database still randomly selects 10 samples for training and another 10 samples for test. The FERET database uses 5 samples of each individual for training to compute average recognition rate.

Fig. 6
figure 6

Average recognition rate of each dimension under extended Yale B

Fig. 7
figure 7

Average recognition rate of each dimension under FERET

As it can be seen from Figs. 6 and 7, the proposed method has equivalent recognition rate comparing with DSPP as a face feature extraction method. The UDSPP even has a certain degree of superiority. Table 3 shows the specific recognition rate of corresponding dimension, where the MRA means maximum recognition rate.

Table 3 Average recognition rate comparison of DSPP and UDSPP (%)

It can be seen from Table 3 that the UDSPP has difference with DSPP. Table 4 shows the recognition result under different training and test sample at 80-dimension and 120-dimension.

Table 4 Recognition rate comparison under different training test samples (%)

From the Table 4, it can be seen that the UDSPP is quite different from DSPP for specific training test instance. As to same training test set, both methods arrive at quite different recognition result under same dimension.

5 Conclusion

Based on sparse preserving projection and linear discrimination analysis, the paper proposed a new discrimination sparse preserving projection method. Meanwhile, the statistical uncorrelated constraint was also introduced to decrease redundant among eigenvectors. The experiment results show that the recognition rate was significantly improved compared with SPP added discrimination method in the paper. At the same time, compared with LDA based on Euclidean distance, it also has certain advantages, especially on the face database with light. As to general face database, if the sample of each class is too less, the recognition of UDSPP will also decline. It is because the shortage of intra-class samples leads to inaccuracy of intra-class sparse reconstruction relationship. Under such circumstance, the global sparse representation should be used to replace intra-class sparse to compensate for sample shortage. Overall, the UDSPP is effective as a face feature extraction method. Its improvement method and adaption scope will be our research focus in the future.