Keywords

1 Introduction

Face recognition is a hot topic in the fields of computer vision and pattern recognition, and many researchers have been working on it [1]. Various techniques have been used for face recognition such as principal component analysis (PCA) [2], linear discriminant analysis (LDA) [3], local preserving projections (LPP) [4] and Eigenface [5] during the past few years. However, these early methods can only deal with simple face recognition problems well. Extreme learning machine (ELM) [6] is a single hidden layer network which can solve face recognition problems very quickly. But its recognition rate is limited. Although sparse representation classifier (SRC) [7] performs better than methods like PCA and LDA, it has no robustness for large continuous occlusion. Yang et al. [8] proposed a regularized robust coding (RRC) model to improve the robustness of SRC, which could regress a given signal with regularized regression coefficients. By assuming that the coding residual and the coding coefficient are respectively independent and identically distributed, RRC seeks for a maximum posterior solution of the coding problem. However, it can only mitigate the occlusion in face recognition to some extent.

In face recognition, the intra-class difference which is caused by variable expressions, illuminations, and disguises, can be shared across different subjects [9]. Based on this idea, Deng et al. [9] proposed the extended sparse representation-based classifier (ESRC). ESRC uses the intra-class variant dictionary and training samples to represent the test samples, which improves the recognition performance for occlusion or non-occlusion face images. In ESRC, a test sample is represented by several training samples that come from the same class. For training samples within a same class, ESRC holds that they have equal importance when representing other samples together. Because different samples even from the same class holds different amount of information or content, which suggests that they should be considered distinctively.

The statistical properties of features such as covariance matrix and distribution characteristics can also be regarded as a new feature and can be used for face recognition. Because they can reflect the characteristics of the original feature. Logarithmic image processing is a mathematical framework based on abstract linear mathematics [10]. It replaces the linear arithmetic operations with a non-linear one which more accurately characterizes the response of human eyes [11]. Texture feature is a kind of global feature. Discrete orthonormal S-transform (DOST) [12] can measure it. Hence, the coefficient of DOST can be view as a global feature. Furthermore, DOST can preserve the phase information. Gabor feature is a local feature. It has a good spatial locality and directional selectivity. Hence, DOST and Gabor feature can be used to construct the discriminative features.

In this paper, we propose an improved extended sparse representation classifier and feature descriptor (IESRCFD) method. In IESRCFD, we first define a logarithmic weighted sum (LWS) feature descriptor which combines advantages of global feature, local feature, statistical properties of features and logarithmic operation. Next, we estimate the atom representativeness using the proposed representativeness estimator. Then we propose an extended sparse weighted representation classifier (ESWRC) by using the representativeness. In ESWRC, the representativeness is used to reflect the importance of the atom when representing other samples. The atom representativeness is incorporated into the sparse representation process as a weight coefficient. Finally, IESRCFD is obtained by combining ESWRC and LWS feature descriptor.

The main contributions of this paper are as follows.

  1. (1)

    Defined a logarithmic weighted sum (LWS) feature descriptor which describes images in a more accurate way. It has combined the advantages of discrete orthonormal S-transform feature, Gabor feature, covariance and logarithmic operation. Gabor feature plays the most important role in our algorithm among them.

  2. (2)

    Proposed ESWRC considering the importance of each atom in representing the query samples by assigning a weight to each atom, which improves the recognition rate.

  3. (3)

    Proposed to use ESWRC and LWS together as IESRCFD to achieve a very high recognition rate.

The rest of paper is organized as follows. Section 2 introduces logarithmic weighted sum feature descriptor. Section 3 introduces the proposed representativeness estimator. Section 4 is the proposed extended sparse weighted representation classifier. Section 5 is the proposed improved extended sparse representation classifier and feature descriptor (IESRCFD) method. Section 6 gives the experimental results. Section 7 is the conclusion.

2 Logarithmic Weighted Sum Feature Descriptor

In this paper, we propose a logarithmic weighted sum (LWS) feature descriptor which combines those advantages of discrete orthonormal S-transform (DOST) feature, Gabor feature, covariance and logarithmic operation. Figure 1 is the construction process of LWS feature descriptor. In Fig. 1, \( w_{1} \) and \( w_{2} \) are the weights.

Fig. 1.
figure 1

The construction process of LWS feature descriptor.

2.1 Discrete Orthonormal S-Transform (DOST) Feature

Recent work shows that a discrete orthonormal S-transform (DOST) basis can be used to accelerate the calculation of S-transform (ST) [13] and eliminate the redundancy in the space-frequency domain [14]. Besides, DOST can preserve the phase information and allows for an arbitrary partitioning of the frequency domain, which achieves the zero redundancy. Those advantages make DOST has a wide range of applications, such as texture classification [12] and signal analysis [14].

DOST is a pared-down version of the fully redundant ST [14]. The main idea of DOST is to form \( N \) orthogonal unit-length basis vectors. Each base vector corresponds to a specific region in the time-frequency domain. And each specific region can be determined by the following three parameters: \( \nu \), \( \gamma \) and \( \tau \). \( \nu \) is the center of each frequency domain band, \( \gamma \) is the width of that band, \( \tau \) is the location in time domain. Thus the \( \lambda \) th basis vector \( D[\lambda ]_{[\nu ,\gamma ,\tau ]} \) can be defined as

$$ D[\lambda ]_{[\nu ,\gamma ,\tau ]} = \left\{ {\begin{array}{*{20}l} {ie^{ - i\pi \tau } \frac{{e^{ - i2\alpha (\nu - \gamma /2 - 1/2)} - e^{ - i2\alpha (\nu + \gamma /2 - 1/2)} }}{2\sqrt \gamma \,\sin \alpha },} \hfill & {\alpha \ne 0} \hfill \\ { - \sqrt \gamma ie^{ - i\pi \tau } ,} \hfill & {\alpha = 0} \hfill \\ \end{array} } \right. $$
(1)

where \( \alpha \) equals to \( \pi (\lambda /N - \tau /\gamma ) \), and it represents the center of temporal window. \( \lambda = 0,1, \ldots ,N - 1 \).

For any one dimensional signal \( h[\lambda ] \), its length is \( N \). Its DOST coefficients \( S_{[\nu ,\gamma ,\tau ]} \) for the region corresponding to the choice of \( [\nu ,\gamma ,\tau ] \) can be obtained by formula (2).

$$ S_{[\nu ,\gamma ,\tau ]} = \frac{1}{\sqrt \gamma }\sum\limits_{f = \nu - \gamma /2}^{\nu + \gamma /2 - 1} {e^{ - i\pi \tau } } e^{{i2\pi \frac{\tau }{\gamma }f}} \left[ {\sum\limits_{\lambda = 0}^{N - 1} {e^{{ - i2\pi \frac{\lambda }{N}f}} h[\lambda ]} } \right] $$
(2)

where the content in the square brackets is \( H[f] \).

For each input image, the DOST coefficient matrix can be obtained by using discrete orthonormal S-transform. The DOST coefficient matrix has the same size of the input image.

2.2 Gabor Wavelet Feature

Two dimensional Gabor wavelet transform is a kind of wavelet transform. It has a good time-frequency localization characteristic. Besides, the function of two dimensional Gabor wavelet is similar to enhancing bottom image features including edge and peak, as well as local features. The extracted Gabor feature not only has a good spatial locality and directional selectivity, but also is robust to illumination and pose variations. Hence, in this paper, we choose the two dimensional Gabor wavelet to extract the local feature of an image.

For each pixel in an image, a vector \( F_{x,y} \) can be obtained by

$$ F_{x,y} = [I(x,y),x,y,\left| {G_{0,0} (x,y)} \right|, \cdots ,\left| {G_{0,7} (x,y)} \right|,\left| {G_{1,0} (x,y)} \right|, \cdots ,\left| {G_{4,7} (x,y)} \right|] $$
(3)

where \( I(x,y) \) represents the intensity value of position \( (x,y) \), \( G_{\varphi ,\sigma } (x,y) \) is the response of a two dimensional Gabor wavelet centered at \( (x,y) \) with orientation \( \varphi \) and scale \( \sigma \):

$$ G_{\varphi ,\sigma } (x,y) = \frac{{k_{\sigma }^{2} }}{{4\pi^{2} }}\sum\nolimits_{t,s} {e^{{ - \frac{{k_{\sigma }^{2} }}{{8\pi^{2} }}((x - s)^{2} + (y - t)^{2} )}} } ({\text{e}}^{{ik_{\sigma } ((x - t)cos(\theta_{\varphi } ) + (y - s)sin(\theta_{\varphi } ))}} - {\text{e}}^{{ - 2\pi^{2} }} ) $$
(4)

where \( k_{\sigma } = \frac{1}{{\sqrt {2^{\sigma - 1} } }} \), \( \theta_{\varphi } = \frac{\pi \varphi }{8} \). Let \( \varphi \text{ = }8 \), \( \sigma \text{ = 5} \).

For an image \( Q \), we assume that the row number of \( Q \) is \( \hat{A} \), the column number of \( Q \) is \( {\hat{\text{B}}} \). Then, its Gabor wavelet feature \( {\mathbf{O}} \) is denoted by

$$ {\mathbf{O}} = [F_{1,1}^{T} , \cdots ,F_{{1,\hat{A}}}^{T} ,F_{2,1}^{T} \cdots ,F_{{2,\hat{A}}}^{T} , \cdots ,F_{{\hat{B},1}}^{T} , \cdots ,F_{{\hat{B},\hat{A}}}^{T} ]^{T} $$
(5)

2.3 LWS Feature Descriptor

For image Q, its Gabor wavelet feature is O. The covariance matrix of O is denoted by \( \varvec{C}_{L} \). Meanwhile, image \( \hat{Q} \) with the size of \( 43 \times 43 \) is obtained by down-sampling the image \( Q \). The DOST coefficient matrix \( {\tilde{\mathbf{O}}} \) of image \( \hat{Q} \) is obtained by formula (2). The covariance matrix of \( {\tilde{\mathbf{O}}} \) is denoted by \( \varvec{C}_{G} \). Taking image \( Q \) as an example, we define a new logarithmic weighted sum (LWS) feature descriptor.

$$ \varvec{FD} = \log [\tilde{\lambda }{ \cdot }\varvec{C}_{G} \text{ + }(1 - \tilde{\lambda }){ \cdot }\varvec{C}_{L} ] $$
(6)

where \( \varvec{FD} \) is the LWS feature descriptor of \( Q \), \( \log [ \cdot ] \) represents the logarithmic operation, \( \tilde{\lambda } \) is the weight coefficient, \( 0 \le \tilde{\lambda } \le 1 \). The LWS feature descriptors of other images in data set are obtained in this way.

3 Representativeness Estimator

In methods based on sparse representation, a test sample is represented by several atoms. For different atoms, because they contain different amounts of information, they have different representativeness to represent other samples generally. In this paper, we propose a representativeness estimator, and use it to estimate the atom representativeness.

The main steps to estimate the representativeness by using representativeness estimator are as follows.

Firstly, for any sample (e.g., image), we remove its relativity by whitening. Then its representativeness is estimated by computing its information entropy. This process is as follows.

Given an arbitrary sample (e.g., image) \( \varvec{S} \), its size is \( M{ \times }N \).

3.1 Remove the Relativity of Sample

  1. (1)

    Remove the mean of \( \varvec{S} \).

    $$ \varvec{X} = \varvec{S} - \bar{\varvec{S}} $$
    (7)

    where \( \bar{\varvec{S}} \) is the mean of \( \varvec{S} \).

  2. (2)

    \( \varvec{X} \) is arranged into a column vector.

  3. (3)

    Compute the covariance matrix \( {\varvec{\Pi}} \).

    $$ {\varvec{\Pi}} = {\text{E}}[\varvec{XX}^{T} ] $$
    (8)

    where \( \text{E[} \cdot \text{]} \) represents the mathematical expectation.

  4. (4)

    Singular value decomposition.

    $$ {\varvec{\Pi}} = {\mathbf{U}}\text{ * }{\varvec{\Lambda}}\text{ * }{\mathbf{U}}^{T} $$
    (9)

    where \( {\varvec{\Lambda}} \) is a diagonal matrix.

  5. (5)

    Compute the whitening matrix \( \tilde{\varvec{M}} \).

    $$ \tilde{\varvec{M}}\text{ = }{\varvec{\Lambda}}^{ - 1/2} {\mathbf{U}}^{T} $$
    (10)
  6. (6)

    Compute the whitened matrix \( \varvec{Z} \).

    $$ \varvec{Z}\text{ = }\varvec{\tilde{\rm M}}\text{ * }\varvec{X} $$
    (11)

3.2 Compute the Information Entropy

  1. (1)

    \( \varvec{Z} \) is arranged into a matrix \( \tilde{\varvec{Z}} \), and the size of \( \tilde{\varvec{Z}} \) is \( M{ \times }N \).

  2. (2)

    Compute the information entropy of \( \tilde{\varvec{Z}} \).

    $$ H_{{\tilde{\varvec{Z}}}} = - \sum\limits_{i = 0}^{255} {p_{i} } \log p_{i} $$
    (12)

where \( H_{{\tilde{Z}}} \) is the information entropy of \( \tilde{\varvec{Z}} \), \( p_{i} \) represents the proportion of pixel whose grayscale is \( i \).

In this paper, \( H_{{\tilde{Z}}} \) represents the representativeness of image \( \varvec{S} \), and the greater the value of \( H_{{\tilde{Z}}} \), the stronger its representativeness.

4 Extended Sparse Weighted Representation Classifier

In ESRC, for atoms with the same class, because they contain different amounts of information, they should have different representativeness to represent other samples. That is to say, their importance is not the same when they together represent other samples. In order to reflect their importance, we propose an extended sparse weighted representation classifier (ESWRC).

We assume that \( \varvec{\psi}= [\varvec{\psi}_{1} ,\varvec{\psi}_{2} ,\varvec{\psi}_{3} , \ldots ,\varvec{\psi}_{k} ] \in {\Re }^{d \times n} \) are the training samples, where \( \varvec{\psi}_{i} \in {\Re }^{{d \times n_{i} }} \) are the training samples with class i. Each column of \( \varvec{\psi}_{i} \) represents a training sample with class i. That is to say, \( \varvec{\psi}_{i} \) contains \( n_{i} \) training samples.

Let \( {\mathbf{B}} = [{\mathbf{B}}_{1} - c_{1} e_{1} , \ldots ,{\mathbf{B}}_{l} - c_{l} e_{l} ] \in \Re^{d \times m} \) represents the intra-class variant bases, where \( e_{i} = [1, \ldots ,1] \in \Re^{{1 \times m_{i} }} \), \( c_{i} \in {\Re }^{d \times 1} \) is the class centroid of class i, \( {\mathbf{B}}_{i} \in \Re^{{d \times m_{i} }} \), \( i = 1,2, \ldots ,l \), \( \sum\limits_{i = 1}^{l} {m_{i} } = m \). \( {\mathbf{B}}_{i} \) are randomly selected from \( \varvec{\psi}_{i} \), and they are used to obtain the intra-class variant bases. Each column of \( {\mathbf{B}}_{i} \) represents a training sample with class \( i \). That is to say, \( {\mathbf{B}}_{i} \) contain \( m_{i} \) training samples.

The main idea of ESWRC can be illustrated by formula (13).

$$ \eta = ({\mathbf{W}}_{\varvec{\psi}} \otimes\varvec{\psi})x + ({\mathbf{W}}_{{\mathbf{B}}} \otimes {\mathbf{B}})\beta + z $$
(13)

where \( \eta \) is a test sample. \( {\mathbf{W}}_{\varvec{\psi}} \) and \( {\mathbf{W}}_{{\mathbf{B}}} \) are the weight coefficients. They correspond to \( \varvec{\psi} \) and \( {\mathbf{B}} \) respectively. x and \( \beta \) are the sparse vectors. z is a noise term with bounded energy \( \left\| z \right\|_{2} < \varepsilon \), and \( z \in {\Re }^{d} \). ‘\( \otimes \)’ is the Hadamard product operator. Figure 2 is the diagram of ESWRC.

Fig. 2.
figure 2

The diagram of ESWRC.

$$ {\mathbf{W}}_{\varvec{\psi}} \text{ = }[{\mathbf{W}}_{{\varvec{\psi}_{1} }} ,{\mathbf{W}}_{{\varvec{\psi}_{2} }} ,{\mathbf{W}}_{{\varvec{\psi}_{3} }} , \ldots ,{\mathbf{W}}_{{\varvec{\psi}_{k} }} ] $$
(14)
$$ {\mathbf{W}}_{{\mathbf{B}}} \text{ = }[{\mathbf{W}}_{{{\mathbf{B}}_{1} }} ,{\mathbf{W}}_{{{\mathbf{B}}_{2} }} ,{\mathbf{W}}_{{{\mathbf{B}}_{3} }} , \ldots ,{\mathbf{W}}_{{{\mathbf{B}}_{l} }} ] $$
(15)

where \( {\mathbf{W}}_{{\varvec{\psi}_{i} }} { \in }\Re^{{1 \times n_{i} }} \) corresponds to \( \varvec{\psi}_{i} \), \( i = 1,2, \ldots ,k \). \( {\mathbf{W}}_{{{\mathbf{B}}_{j} }} { \in }\Re^{{1 \times m_{j} }} \) corresponds to \( {\mathbf{B}}_{j} - c_{j} e_{j} \), \( j = 1,2, \ldots ,l \).

For all samples with class i, the expression of \( {\mathbf{W}}_{{\varvec{\psi}_{i} }} \) is as follows.

$$ {\mathbf{W}}_{{\varvec{\psi}_{i} }} \text{ = }\left[ {\Gamma _{i1} ,\Gamma _{i2} ,\Gamma _{i3} , \ldots ,\Gamma _{{in_{i} }} } \right] $$
(16)

where \( \Gamma _{ij} \) represents the contribution factor of \( j^{th} \) image in class i.

$$ \Gamma _{ij} = \frac{{H_{ij} }}{{\sum\limits_{j = 1}^{{n_{i} }} {H_{ij} } }} $$
(17)

where \( H_{ij} \) represents the information entropy of \( j^{th} \) image in class i.

Let \( ({\mathbf{W}}_{\varvec{\psi}} \otimes\varvec{\psi}\text{) = }\varvec{\psi}^{ \bullet } = [\varvec{\psi}_{1}^{ \bullet } ,\varvec{\psi}_{2}^{ \bullet } ,\varvec{\psi}_{3}^{ \bullet } , \ldots ,\varvec{\psi}_{k}^{ \bullet } ] \in \Re^{d \times n} \), where \( \varvec{\psi}_{i}^{ \bullet } \) represent those new samples with class i. Each column of \( \varvec{\psi}_{i}^{ \bullet } \) represents a new sample. \( ({\mathbf{W}}_{{\mathbf{B}}} \otimes {\mathbf{B}})\text{ = }{\mathbf{B}}^{ \bullet } = [{\mathbf{B}}_{1}^{ \bullet } \varvec{ - }c_{1}^{ \bullet } e_{1} \varvec{,}{\mathbf{B}}_{2}^{ \bullet } - c_{2}^{ \bullet } e_{2} , \ldots ,{\mathbf{B}}_{l}^{ \bullet } - c_{l}^{ \bullet } e_{l} ] \), where \( e_{i} = [1, \ldots 1] \in \Re^{{1 \times m_{i} }} \), \( c_{i}^{ \bullet } \in \Re^{{1 \times m_{i} }} \) is the class centroid of class i, \( {\mathbf{B}}_{i}^{ \bullet } \in \Re^{{1 \times m_{i} }} \), \( i = 1,2, \ldots ,l \), \( \sum\limits_{i = 1}^{l} {m_{i} = m} \). \( {\mathbf{B}}_{i}^{ \bullet } \) are the new training samples, and they are used to obtain the intra-class variant bases. Each column of \( {\mathbf{B}}_{i}^{ \bullet } \) represents a new training sample with class i. That is to say, \( {\mathbf{B}}_{i}^{ \bullet } \) contain \( m_{i} \) new training samples.

Firstly, the dimensions of \( \varvec{\psi}^{ \bullet } ,{\mathbf{B}}^{ \bullet } \) and \( \eta \) are reduced by PCA. Then \( \tilde{\varvec{\psi}} = [\tilde{\varvec{\psi} }_{1} ,\tilde{\varvec{\psi} }_{2} ,\tilde{\varvec{\psi} }_{3} , \ldots ,\tilde{\varvec{\psi} }_{k} ] \in \Re^{{\widetilde{d} \times n}} \), \( {\mathbf{\tilde{B} = }}\text{[}{\tilde{\mathbf{B}}}_{1} - \tilde{c}_{1} e_{1} , \ldots, {\tilde{\mathbf{B}}}_{l} - \tilde{c}_{l} e_{l} ] \in \Re^{{\tilde{d} \times m}} \) and \( \tilde{b} \in {{\Re}}^{{\tilde{d} \times 1}} \) are obtained, where \( \tilde{\varvec{\psi}}_{i} \in {\boldsymbol{\Re}}^{{\tilde{d} \times n_{i} }} \), \( {\tilde{\mathbf{B}}}_{i} \in {\boldsymbol{\Re}}^{{\tilde{d} \times m_{i} }} \) and \( \tilde{c}_{i} \in {\boldsymbol{\Re}}^{{\tilde{d} \times 1}} \) correspond to \( \psi_{i} \), \( {\mathbf{B}}_{i} \) and \( c_{i} \) respectively.

Let \( \varvec{\tilde{A} = }\text{[}\tilde{\varvec{\psi }},{\tilde{\text{B}}}\text{]} \), \( \tilde{x} = \left[ {\begin{array}{*{20}c} x \\ \beta \\ \end{array} } \right] \), \( G(\tilde{x}) = \left\| {\tilde{x}} \right\|_{1} \), \( H(\tilde{x}) = \tilde{b} - \tilde{\varvec{A}}\tilde{x} \).

We have the following sparse representation problem.

$$ \min_{{\tilde{x}}} \left\| {\tilde{x}} \right\|_{1} \quad {\text{sub}} . {\text{to}}\;\tilde{b} = \tilde{\varvec{A}}\tilde{x} $$
(18)

where \( \tilde{b} \) is the test sample that needs to be represented, \( \tilde{\varvec{A}} \) is the sparse dictionary matrix.

We use the augmented Lagrangian method to solve (18), then obtain the solution of (18), and denoted by \( \tilde{x}^{ * } \text{ = }\left[ {\begin{array}{*{20}c} {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{x} } \\ {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{\beta } } \\ \end{array} } \right] \). After that, the residual is computed by

$$ R_{i} = (\tilde{b}) = \left\| {\tilde{b} - \tilde{\varvec{A}}\left[ {\begin{array}{*{20}c} {\delta_{i} (\tilde{x})} \\ {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{\beta } } \\ \end{array} } \right]} \right\|_{2} $$
(19)

where \( i = 1,2, \ldots ,k \), \( \delta_{i} (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{x} ) \) is a new vector whose only nonzero entries are the entries in \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{x} \) those are associated with class i.

The category of probe sample \( \tilde{b} \) can be obtained by

$$ \text{Identity (}\tilde{b}\text{) = arg}\mathop {\hbox{min} }\limits_{i} R_{i} (\tilde{b}) $$
(20)

Hence, the label of \( \eta \) is \( \text{Identity (}\tilde{b}\text{)} \).

Figure 3 is the flow chart of EWSRC.

Fig. 3.
figure 3

The flow chart of EWSRC.

5 Improved Extended Sparse Representation Classifier and Feature Descriptor Method

On the one hand, the defined LWS feature descriptor is robust to illumination variations and pose variations. On the other hand, EWSRC has a strong robustness to occlusion and non-occlusion. In order to combine the advantages of LWS feature descriptor and EWSRC, we propose an improved extended sparse representation classifier and feature descriptor (IESRCFD) method. IESRCFD is as follows. First of all, LWS feature descriptor of each image is obtained by formula (9). Then the obtained LWS feature descriptors are used as the input of ESWRC. Finally, the labels of testing samples are obtained. The time complexity of IESRCFD is about \( O(n^{2} ) \). Figure 4 is the flow chart of IESRCFD.

Fig. 4.
figure 4

The flow chart of IESRCFD.

6 Experimental Results and Analysis

Our IESRCFD is compared with other algorithms on FEI face database and FERET database respectively.

6.1 FEI Face Database

The FEI face database is a Brazilian face database, and it contains 200 different persons. Each person has 14 different images. All images are colorful and taken against a white homogenous background in an upright frontal position with profile rotation of up to about 180°. Scale might vary about 10% and the original size of each image is \( 640 \times 480 \) pixels. In our experiments, the whole dataset is used for our experiments. The training set and test set both account for half of the whole dataset. That is to say, for each person, we randomly select seven images as the training set, and the rest seven images are used as the test set. The size of each image is down sampled to \( 64 \times 64 \). Figure 5 shows some examples of image variations from the FEI face database.

Fig. 5.
figure 5

Some examples of image variations from the FEI face database.

Table 1 lists the recognition rates of different algorithms on FEI face database.

Table 1. Recognition rates of different algorithms on FEI face database (\( \tilde{\lambda } = 0.1 \))

From Table 1 we can see that the recognition rate of our IESRCFD is 91.6%, which achieves the highest recognition rate, with about 35%, 12%, 12%, 25%, 30%, 11%, 20% and 14% improvements over RRC, ESRC, SRC, KNN, ELM, ELMSRC, RSC and KSRC respectively.

6.2 FERET Database

As in [20, 21], the “b” subset of FERET database is used to verify the recognition performance of various algorithms. The “b” subset of FERET consists of 198 subjects, each subject contains 7 different images. The training set images contain frontal face images with neutral expression “ba”, smiling expression “bj”, and illumination changes “bk”. While the test set images involve face images of varying pose angle: “bd”-\( { + }25^{ \circ } \), “be”-\( + 15^{ \circ } \), “bf”-\( - 15^{ \circ } \), and “bg”-\( - 25^{ \circ } \). As in [20], all images are down sampled to \( 64 \times 64 \). Some samples of face images in “b” subset of FERET database are shown in Fig. 6.

Fig. 6.
figure 6

Some samples of face images in “b” subset of FERET database.

Table 2 lists the recognition rates of different algorithms on the “b” subset of FERET database.

Table 2. Recognition rates of different algorithms on “b” subset of FERET database (\( \tilde{\lambda } = 0.1 \))

In Table 2 one see that when the test set are “bd”, “be”, “bf” and “bg” subsets, the recognition rates of our IESRCFD are 84.5%, 99.0%, 99.5% and 90.5% respectively, which are higher than those of other algorithms. The average recognition rate of IESRCFD is 93.4%, which not only achieves the highest recognition rate, but also shows a 7%, 30%, 37%, 42%, 33%, 39%, 31% and 35% better performance compared with that of GSRC, LogE-SR, TSC, ESRC, KNN, ELM, ELMSRC and KSRC respectively.

7 Conclusion

In this paper, we propose an improved extended sparse representation classifier and feature descriptor (IESRCFD) method by using proposed LWS feature descriptor and ESWRC simultaneously. Experimental results showed that our IESRCFD performs better than many algorithms. The recognition rate of IESRCFD is about 20%, 14%, 12% and 11% higher than those of RSC, KSRC, ESRC and ELMSRC respectively on FEI face database. The recognition rate of IESRCFD is about 31%, 7% and 4% higher than those of ELMSRC, GSRC and RSR respectively on FERET face database. Meanwhile, experimental results illustrated that IESRCFD could convergence and meet the real time requirement.