Robust face recognition via low-rank sparse representation-based classification

Du, Hai-Shun; Hu, Qing-Pu; Qiao, Dian-Feng; Pitas, Ioannis

doi:10.1007/s11633-015-0901-2

Robust face recognition via low-rank sparse representation-based classification

Regular Paper
Published: 06 November 2015

Volume 12, pages 579–587, (2015)
Cite this article

Download PDF

International Journal of Automation and Computing Aims and scope Submit manuscript

Robust face recognition via low-rank sparse representation-based classification

Download PDF

Hai-Shun Du¹,
Qing-Pu Hu¹,
Dian-Feng Qiao¹ &
…
Ioannis Pitas²

1894 Accesses
19 Citations
Explore all metrics

Abstract

Face recognition has attracted great interest due to its importance in many real-world applications. In this paper, we present a novel low-rank sparse representation-based classification (LRSRC) method for robust face recognition. Given a set of test samples, LRSRC seeks the lowest-rank and sparsest representation matrix over all training samples. Since low-rank model can reveal the subspace structures of data while sparsity helps to recognize the data class, the obtained test sample representations are both representative and discriminative. Using the representation vector of a test sample, LRSRC classifies the test sample into the class which generates minimal reconstruction error. Experimental results on several face image databases show the effectiveness and robustness of LRSRC in face image recognition.

Extended sparse representation-based classification method for face recognition

Article 25 May 2018

An alternative to face image representation and classification

Article 01 March 2018

Discriminative face recognition via kernel sparse representation

Article 18 June 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Face recognition (FR) has been a hot topic in computer vision and pattern recognition due to its importance in many real-world applications, such as public security monitoring, access control and card identification. Although face images usually have high dimensionality, they possibly reside on a low-dimensional subspace[1,2]. Many subspace learning methods, such as principal component analysis (PCA)[3], linear discriminant analysis (LDA)[4], locality preserving projection (LPP)[5], marginal Fisher analysis (MFA)[6], local discriminant embedding (LDE)[7] and sparsity preserving discriminant embedding (SPDE)[8] have been proposed for reducing face image dimensionality. Subsequently, a classifier, such as nearest neighbor classifier (NNC)[9] or support vector machine (SVM), is usually employed for classification. Although subspace learning methods have achieved a good accuracy in FR, they are not robust to corrupted face images[10], where corruptions are caused due to occlusion, disguise or pixel contamination. Especially, when the number of training samples is small, the learned subspace projections may be wrong or deficient[11].

Recently, a new robust FR framework, using sparse representation-based classification (SRC)[12], was presented. SRC sparsely encodes a test sample over a dictionary consisting of training samples via L₁-norm optimization techniques[13–15]. Then, the test sample is classified into the class that generates minimal reconstruction residual. In the case of test samples suffering from corruption and occlusion, SRC uses an identity matrix as occlusion dictionary to deal with occlusion or corruption of test samples. However, the dimensionality of the identity matrix is usually very high, which renders sparse coding computationally expensive. To solve this issue, Deng et al.[16]. proposed an extended SRC (ESRC) method, in which an intra-class variation matrix, determined by subtracting the class centroid from the same class samples, is used as occlusion dictionary. The dimensionality of occlusion dictionary used in ESRC is much smaller than that of occlusion dictionary used in SRC. This fact greatly improves computational efficiency[16].

Although SRC and its extensions have been successfully used in FR, their working mechanism has not been clearly revealed. Recently, Zhang et al.[17] proposed an efficient FR method, namely collaborative representation-based classification (CRC). They thought that it is the collaborative representation using all training samples, but not the L₁-norm sparsity constraint, which plays a key role in the success of SRC[17]. Therefore, they replaced the L₁-norm by the computationally much more efficient L₂-norm and claimed that CRC can achieve competitive FR performance compared with SRC. Naseem et al.[18] presented a new L₂-norm based classification method, namely the linear regression-based classification (LRC), by using the concept that samples from the one subject reside on a linear subspace. LRC linearly encodes a test sample over class-specific training samples by using the least squares method and classifies the test sample based on class-specific reconstruction residuals.

Recently, Liu et al.[19] presented a low-rank representation (LRR) method to reveal the global structure of data that are drawn from multiple subspaces. Lu et al.[20] incorporated a graph regularization term into the LRR to encode the local structure information of data. Ma et al.[21] proposed a discriminative low-rank dictionary learning algorithm for sparse representation. Zhang et al.[22] proposed a structured low-rank representation method for image classification, in which they used the discriminative low-rank representations of training data to learn a dictionary and design a linear multi-classifier.

Inspired by the above methods, we present a low-rank sparse representation-based classification (LRSRC) method for robust FR. Given a set of test samples, LRSRC seeks the lowest-rank and sparsest representation matrix over all the training samples. As observed in [19], low-rank model can reveal the subspace structures of data, while sparsity helps to recognize the data class. Since LRSRC imposes both the lowest-rank and sparsest constraints on the representation matrix, the obtained test sample representations are both representative and discriminative. Using the representation vector of a test sample, LRSRC assigns the test sample to the class that generates minimal reconstruction error. It is worth mentioning that LRSRC can efficiently deal with corrupted test samples due to its accurate corruption estimation. It is worth noting that there are some main differences between our work and the works of [21, 22], though our work is related to them. The main differences are as follows: 1) Our work focuses on the low-rankness and sparsity of the representation matrix of test samples, while the work of [21], as a dictionary learning method, mainly emphasizes the low-rankness of the learned dictionary; 2) Our work directly uses the training data matrix as dictionary to calculate the lowest-rank and sparsest representation matrix of test samples, and classifies test samples based on the minimal reconstruction errors, while [22] mainly uses the discriminative low-rank representations of training data to learn a dictionary and design a linear multi-classifier for image classification.

The rest of the paper is outlined as follows. In Section 2, we briefly review some related works, including sparse representation-based classification (SRC) and low-rank representation (LRR). Section 3 presents the proposed LRSRC method for robust FR in detail. Experimental results and corresponding discussions are presented in Section 4. Finally, conclusions are drawn in Section 5.

2 Related work

2.1 Sparse representation based classification

The sparse representation-based classification (SRC) was proposed in [12] for robust face recognition. Suppose there are C classes of training samples, let ${X_i} = [{x_{i1}},{x_{i2}}, \cdots ,{x_{iNi}}] \in {{\bf{R}}^{D \times {N_i}}}$ be the matrix formed by N_i training samples of i-th class, in which x_ij is a one-dimensional vector produced by the j-th sample of the i-th class. Then X = [X₁, X₂, ⋯, X_C] ∈ R^D×N is the matrix of all training samples, where $N = \sum\nolimits_{i = 1}^C {{N_i}}$ is the total number of training samples. Using X as dictionary, a test sample y can be approximately represented as linear combination of the elements of X, i.e. y = Xa, in which a is the representation coefficient vector of y. If D < N, the linear equation y = Xa is under determined. Therefore, the solution a is not unique. Since the test sample y can be adequately represented by the training samples from the correct class, the representation is obviously sparse when the amount of training samples is large enough. SRC claims that the sparser the representation coefficient vector a is, the easier it is to recognize the test samples class label[12]. This inspires one to find the sparsest solution of y = Xa by solving the following optimization problem.

$$\hat a = \mathop {\arg \min}\limits_a {\left\Vert {y - Xa} \right\Vert _2} + \lambda {\left\Vert a \right\Vert _1}$$

(1)

where λ is a scalar constant. The optimization problem (1) can be efficiently solved by many algorithms, such as basis pursuit[13], l1_ls[14] and alternating direction algorithm[15].

Having obtained the sparsest solution â, let δ_i(â) be a vector, whose only nonzero entries are the entries in â that are associated with class i[12]. Using δ_i(â), the test sample y can be reconstructed as ŷ_i = Xδ_i(â). Then, the test sample y is classified into the class that minimizes class reconstruction error between y and ŷ_i:

$${\rm{identity}}(y) = \mathop {{\rm{min}}}\limits_i {\left\Vert {y - X{\delta _i}(\hat a)} \right\Vert _2}.$$

(2)

In real-world applications, the observations are usually corrupted or occluded. A test sample y is rewritten as

((3))

where B = [X, I] ∈ R^D×(D+N) and I ∈ R^D×D is the identity matrix. The clean test sample y₀ and the corruption error e can be sparsely encoded over the dictionary X and the occlusion dictionary I. The sparse solution ŵ = [â, ê]^T can be computed by solving an optimization problem similar to problem (1). Then, the test sample y is classified by the following decision rule:

$${\rm{identity}}(y) = \mathop {\min}\limits_i {\left\Vert {y - X{\delta _i}(\hat a) - \hat e} \right\Vert _2}.$$

(4)

2.2 Low-rank representation

Recently, Liu et al.[19] presented a low-rank representation (LRR) method to recover the subspace structures from data that reside on a union of multiple subspaces. LRR seeks the lowest-rank representation matrix Z of data matrix X over a given dictionary A. The rank minimization problem is formulated as

$$\mathop {\min}\limits_{Z,E} \;{\rm{rank}}(Z) + \lambda {\left\Vert E \right\Vert _l},\quad {\rm{s}}.{\rm{t}}.\quad X = AZ + E$$

(5)

where λ is scalar constant, and ∥E∥_l denotes a certain regularization strategy for characterizing different corruptions, such as ∥E∥₀ for characterizing random corruption, and the ∥E∥_2,0 for describing sample-specific corruption.

If one just considers the case in which the data matrix X entries are randomly corrupted, the rank optimization problem can be formulated as

$$\mathop {\min}\limits_{Z,E} \;{\rm{rank}}(Z) + \lambda {\left\Vert E \right\Vert _0},\quad {\rm{s}}.{\rm{t}}.\quad X = AZ + E.$$

(6)

As finding the solution of optimization problem (6) is NP-hard and computationally expensive, Liu et al.[19] replaced the rank with nuclear norm and the L₀-norm with L₁-norm. So, the optimization problem (6) is relaxed to the optimization problem as

$$\mathop {\min}\limits_{Z,E} {\left\Vert Z \right\Vert _{\ast}}+\lambda {\left\Vert E \right\Vert _1},\quad {\rm{s}}.{\rm{t}}.\quad X = AZ + E$$

(7)

where ∥Z∥_* denotes the nuclear norm (i.e., the sum of the singular values) of Z. Given an appropriate dictionary A (in [19], the data matrix X is used as dictionary), the augmented Lagrange multipliers (ALM) algorithm[23] can be used to solve the optimization problem (7). Having obtained the optimal solution (Z*, E*), the representation matrix Z* can be used for recovering the underlying subspace structures of data X.

3 Low-rank sparse representation based classification

In this section, we present a low-rank sparse representation-based classification method for robust face recognition. LRSRC firstly seeks a lowest-rank and sparsest representation matrix of the test samples. Then, by using the representation vector of a test sample, LRSRC assigns the test sample to the class that generates minimal reconstruction error.

3.1 Low-rank sparse representation

Let the training data matrix X = [X₁, X₂, ⋯, X_C] ∈ R^D×N be composed by C classes of training samples, and Y = [y₁, y₂, ⋯, y_M] ∈ R^D×M be the data matrix of all test samples. Using X as dictionary, a test sample y_i can be approximately represented as linear combination of the elements of X, i.e., y_i = Xz_i, where z_i is the representation coefficient vector of y_i. Thus, Y = XZ, where Z =[z₁, z₂, ⋯, z_M] ∈ R^N×M is the representation matrix of Y.

SRC[12] imposes the sparsest constraint on the representation vector to make it discriminative. However, it does not consider the subspace structures of data, which is essential for classification tasks. As we know from [19], low-rank model can reveal the subspace structures of data. Therefore, we impose both lowest-rank and sparsest constraints on the representation matrix Z and formulate the optimization problem as

$$\mathop {\min}\limits_Z \;{\rm{rank}}(Z) + \beta {\left\Vert Z \right\Vert _0},\quad {\rm{s}}.{\rm{t}}.\quad Y = XZ$$

(8)

where β is a scalar constant to balance low-rankness and sparsity.

As the optimization problem (8) is non-convex, we substitute the L₁-norm for L₀-norm and the nuclear norm for the rank[19]. Therefore, the optimization problem (8) is relaxed as

$$\mathop {\min}\limits_Z {\left\Vert Z \right\Vert _{\ast}}+\beta {\left\Vert Z \right\Vert _1},\quad {\rm{s}}.{\rm{t}}.\quad Y = XZ.$$

(9)

In many real-world applications, face images are usually corrupted or occluded. Thus, the test data matrix can be rewritten as Y = Y₀ + E = XZ + E, in which Y₀ is the clean test data matrix and E is the error matrix. The optimization problem is rewritten as

$$\matrix{{\,\mathop {\min}\limits_{Z,E} {{\left\Vert Z \right\Vert}_{\ast}}+\beta {{\left\Vert Z \right\Vert}_1} + \lambda {{\left\Vert E \right\Vert}_1}} \hfill \cr{{\rm{s}}.{\rm{t}}.\quad Y = XZ + E} \hfill \cr}$$

(10)

where λ is a scalar parameter.

3.2 Optimization

By introducing an auxiliary variable W, the problem (10) can be converted into an equivalent optimization problem as

$$\matrix{{\,\mathop {\min}\limits_{Z,E} {{\left\Vert Z \right\Vert}_{\ast}}+\beta {{\left\Vert W \right\Vert}_1} + \lambda {{\left\Vert E \right\Vert}_1}} \hfill \cr{{\rm{s}}.{\rm{t}}.\quad Y = XZ + E,Z = W.} \hfill \cr}$$

(11)

The optimization problem (11) can be solved by the linearized alternating direction method with adaptive penalty (LADMAP)[24–26]. We give the augmented Lagrange function of problem (11) as

$$\matrix{{L(Z,W,E,{T_1},{T_2},\mu) = {{\left\Vert Z \right\Vert}_\ast} + \beta {{\left\Vert W \right\Vert}_1} + \lambda {{\left\Vert E \right\Vert}_1} +} \hfill \cr{\qquad \left\langle {{T_1},Y - XZ - E} \right\rangle + \left\langle {{T_2},Z - W} \right\rangle +} \hfill \cr{\qquad {\mu \over 2}(\left\Vert {Y - XZ - E} \right\Vert _F^2 + \left\Vert {Z - W} \right\Vert _F^2)} \hfill \cr}$$

(12)

where T₁ and T₂ are Lagrangian multipliers, and µ > 0 is a penalty parameter. By some simple algebra, the above augmented Lagrangian function is rewritten as

$$\matrix{{L(Z,W,E,{T_1},{T_2},\mu) = {{\left\Vert Z \right\Vert}_\ast} + \beta {{\left\Vert W \right\Vert}_1} + \lambda {{\left\Vert E \right\Vert}_1} +} \hfill \cr{\qquad h(Z,W,E,{T_1},{T_2},\mu) - {1 \over {2\mu}}(\left\Vert {{T_1}} \right\Vert _F^2 + \left\Vert {{T_2}} \right\Vert _F^2)} \hfill \cr}$$

(13)

where

$$\matrix{{h(Z,W,E,{T_1},{T_2},\mu) =} \hfill \cr{\quad {\mu \over 2}\left({\left\Vert {Y - XZ - E + {{{T_1}} \over \mu}} \right\Vert _F^2 + \left\Vert {Z - W + {{{T_2}} \over \mu}} \right\Vert _F^2} \right).} \hfill \cr}$$

(14)

We minimize the function L with other two variables fixed to update the variables Z, W, E alternately. In each iteration, the updating rules are given as

$$\matrix{{{Z_{k + 1}} = \mathop {\arg \min}\limits_Z {{\left\Vert Z \right\Vert}_\ast} +} \hfill \cr{\quad \left\langle {{\nabla _Z}h({Z_k},{W_k},{E_k},{T_{1,k}},{T_{2,k}},{\mu _k}),Z - {Z_k}} \right\rangle +} \hfill \cr{\quad {{{\mu _k}L} \over 2}\left\Vert {Z - {Z_k}} \right\Vert _F^2 =} \hfill \cr{\quad \mathop {\arg \min}\limits_Z {{\left\Vert Z \right\Vert}_\ast} + {{{\mu _k}L} \over 2}\left\Vert {Z - {A_k}} \right\Vert _F^2 =} \hfill \cr{\quad {\Theta _{{{({\mu _k}L)}^{- 1}}}}({A_k})} \hfill \cr}$$

(15)

where

$$\matrix{{{A_k} = {Z_k} + {X^{\rm{T}}}{{Y - X{Z_k} - {E_k} + {{{T_{1,k}}} \over {{\mu _k}}}} \over L} -} \hfill \cr{\qquad {{{Z_k} - {W_k} + {{{T_{2,k}}} \over {{\mu _k}}}} \over L}.} \hfill \cr}$$

$L = \left\Vert X \right\Vert _2^2,\Theta$ is the singular value thresholding operator[27], and ∇_zh is the partial differentiation of function h with respect to Z.

$$\matrix{{{W_{k + 1}} = \mathop {\arg \min}\limits_W \beta {{\left\Vert W \right\Vert}_1} +} \hfill \cr{\qquad {{{\mu _k}} \over 2}\left\Vert {{Z_{k + 1}} - {W_k} + {{{T_{2,k}}} \over {{\mu _k}}}} \right\Vert _F^2 =} \hfill \cr{\qquad {S_{\beta \mu _k^{- 1}}}\left({{Z_{k + 1}} + {{{T_{2,k}}} \over {{\mu _k}}}} \right)} \hfill \cr}$$

(16)

$$\matrix{{{E_{k + 1}} = \mathop {\arg \min}\limits_E \lambda {{\left\Vert E \right\Vert}_1} +} \hfill \cr{\qquad {{{\mu _k}} \over 2}\left\Vert {Y - X{Z_{k + 1}} - E + {{{T_{1,k}}} \over {{\mu _k}}}} \right\Vert _F^2 =} \hfill \cr{\qquad {S_{\lambda \mu _k^{- 1}}}\left({Y - X{Z_{k + 1}} + {{{T_{1,k}}} \over {{\mu _k}}}} \right).} \hfill \cr}$$

(17)

In (16) and (17), S is the soft-thresholding operator[28]. The complete procedure is outlined in Algorithm 1.

3.3 Classification

Given a training data matrix X = [X₁, X₂, ⋯, X_C] ∈ R^D×N and a test data matrix Y = [y₁, y₂, ⋯, y_M] G R^D×M, we can find the lowest-rank and sparsest representation matrix Ẑ = [ẑ₁, ẑ₂, ⋯, ẑ_M] and the error matrix Ê = [ê₁, ê₂, ⋯, ê_M] of Y by Algorithm 1. Let δ_i(ẑ_j)be a vector whose only nonzero entries are the entries in ẑ_j that are associated with class i[12]. Using δ_i (ẑ_j), the test sample y_j can be reconstructed, i.e. $\hat y_j^i = X{\delta _i}({{\hat z}_j}) + {{\hat e}_j}$. Then, the test sample y_j is classified into the class that minimizes the class reconstruction error between y_j and $\hat y_j^i$:

$${\rm{identity}}({y_j}) = \arg {\min _i}{\left\Vert {{y_j} - X{\delta _i}({{\hat z}_j}) - {{\hat e}_j}} \right\Vert _2}.$$

(18)

The proposed LRSRC algorithm is summarized in Algorithm 2.

Algorithm 1. LADMAP algorithm for low-rank sparse representation

Input: Training data matrix X = [X₁, X₂, ⋯, X_C] ∈ R^D×N, test data matrix Y = [y₁, y₂, ⋯, y_M] ∈ R^D×M, and parameters λ > 0; β > 0.
Initialize: ${Z_0} = {W_0} = {E_0} = {T_{1,0}} = {T_{2,0}} = 0,{\mu _0} = 0.1,{\mu _{\max}} = {10^{10}},{\rho _0} = 1.1,{\varepsilon _1} = {10^{- 2}},{\varepsilon _1} = {10^{- 6}},L = \left\Vert X \right\Vert_2^2,maxiter = 1000,k = 0$. while no converged and k ≤ maxiter do
1. 1)
  Update the variable Z according to (15)
2. 2)
  Update the variable W according to (16)
3. 3)
  Update the variable E according to (17)
4. 4)
  Update the Lagrange multipliers:
  $$\matrix{{{T_{1,k + 1}} = {T_{1,k}} + {\mu _k}(Y - X{Z_{k + 1}} - {E_{k + 1}})} \hfill \cr {{T_{2,k + 1}} = {T_{2,k}} + {\mu _k}({Z_{k + 1}} - {W_{k + 1}}).} \hfill \cr}$$
5. 5)
  Update the parameter μ: μ_k+1 = min (ρμ_k, μ_max),
6. 6)
  Check the convergence condition:
  $${{{\mu _{k + 1}}\max \left({\sqrt L {{\left\Vert {{Z_{k + 1}} - {Z_k}} \right\Vert}_F},{{\left\Vert {{W_{k + 1}} - {W_k}} \right\Vert}_F},{{\left\Vert {{E_{k + 1}} - {E_k}} \right\Vert}_F}} \right)} \over {{{\left\Vert Y \right\Vert}_F}}} < {\varepsilon _1}\;{\rm{and}}\;{{{{\left\Vert {Y - X{Z_{k + 1}} - {E_{k + 1}}} \right\Vert}_F}} \over {{{\left\Vert Y \right\Vert}_F}}} < {\varepsilon _2}.$$
7. 7)
  k ← k + 1
  
  end while
Output: An optimal solution (Z_k, W_k, E_k).

Algorithm 2. Low-rank sparse representation-based classification

Input: Training data matrix X = [X₁, X₂, ⋯, X_C] ∈ R^D×N belonging to C classes, test data matrix Y = [y₁, y₂, ⋯, y_M] ∈ R^D×M, parameters λ > 0, β > 0.
1. 1)
  Normalize the columns of X to have unit L₂-norm
2. 2)
  Solve the following optimization problem to obtain the representation matrix Z and the error matrix E of test data matrix Y by Algorithm 1:
  $$\mathop {\min}\limits_{Z,E} {\left\Vert Z \right\Vert_\ast} + \beta {\left\Vert Z \right\Vert_1} + \lambda {\left\Vert E \right\Vert_1},\quad {\rm{s}}.{\rm{t}}.\quad Y = XZ + E$$
3. 3)
  for j = 1: M do
  $${\rm{identity}}({y_j}) = \arg {\min _i}{\left\Vert {{y_j} - X{\delta _i}({{\hat z}_j}) - {{\hat e}_j}} \right\Vert_2},\quad i = 1, \cdots ,C.$$
  
  end for
Output: Class labels of all the test samples.

4 Experiments

We perform experiments on Extended Yale B[29], CMU PIE[30], and AR databases[31] to evaluate the performance of our proposed LRSRC method. We compare LRSRC with some state-of-the-art methods, including nearest neighbor classifier (NN)[9], SRC[12], ESRC[16], CRC[17] and LRC[18]. Note that PCA is used to reduce the dimension of face image and the Eigenface features are used in all the methods.

4.1 Extended Yale B database

The Extended Yale B database[29] consists of 2414 face images of 38 individuals. Each individual has about 64 images, taken under various laboratory-controlled lighting conditions. Fig. 1 (a) shows several example images of one subject in the Extended Yale B database. In our experiments, each image is manually cropped and resized to 32 × 32 pixels. We randomly select l (=16, 24, 32) samples per individual for training, and the remainder for testing. For each given l, we independently carry out all the methods 5 times, and report the average 5-fold recognition rates. The Eigenface feature dimensions are set as 50, 100, 150, 200, 300 and 500 for the comparison of recognition rates across different methods. We set LRSRC parameters as λ = 10, β = 0.09, 0.1 and 0.08 for the 16, 24, and 32 training samples per individual, respectively. Fig. 2 shows the curves of recognition rate versus feature dimension. Table 1 gives the best recognition rates of all the methods.

Table 1 The best recognition rates (%) and the corresponding dimensions (in the parentheses) of all the methods on Extended Yale B database

Full size table

From Fig. 2, one can see that our method outperforms all the other methods in most of dimensions under different training sets. For example, with a feature dimension of 200, our method performs better than all the other methods with more than 3% improvement of recognition rate under 3 different training sets. Meanwhile, it can be seen from Table 1 that the best recognition rates of our method are always greater than those of the other methods. Especially, our method achieves the best recognition rate of 94.87% when 16 samples per individual are used for training, while SRC merely arrives at the best recognition rate of 94.26% even when 32 samples per individual are used for training.

This indicates that our method can achieve a good recognition performance, with smaller number of training samples. From the experimental results, we confirm the effectiveness and robustness of our method even if face images are corrupted by illumination variations.

4.2 CMU PIE database

The CMU PIE database[30] contains over 40 000 face images of 68 individuals. Images of each individual were acquired across 13 different poses, under 43 different illumination conditions, and with 4 different expressions. Fig. 1 (b) shows some example images of one subject in the CMU PIE database. Here, we use a near frontal pose subset, namely C07, for experiments, which contains 1629 images of 68 individuals, each individual with about 24 images. In our experiment, all images are manually cropped and resized to be 32 × 32 pixels. A random subset with l (= 8, 10, 12) samples per individual is selected for training, and the rest for testing. For each given l, we independently run all the methods 5 times, and report the average 5-fold recognition rates. The Eigenface feature dimensions are selected as 50, 75, 100, 150, 200 and 300 for the comparison of face recognition rates across different methods. The LRSRC parameters are set as λ = 10, β = 0.09 for these experiments. The curves of recognition rate versus feature dimension are presented in Fig. 3. The best recognition rates of all the methods are shown in Table 2.

Table 2 The best recognition rates (%) and the corresponding dimensions (in the parentheses) of all the methods on CMU PIE database

Full size table

From Fig. 3, one can clearly see that our method outperforms all the other methods in most of dimensions under 3 training sets. In detail, the recognition rates of our method are about 1.5% greater than those of the other methods in most of dimensions for 8 and 10 training samples per individual, and 1% for 12 training samples per individual. Furthermore, one can know from Table 2 that the best recognition rate of our method is always greater than those of the other methods for each training set. In addition, one can also find from Table 2 that SRC achieves the best recognition rate of 93.78% when 12 samples per individual are used for training, which is still lower than that of our method, even when 8 samples per individual are used for training. It indicates again that our method can obtain good recognition performance, with smaller number of training samples. The experimental results further verify the effectiveness and robustness of our method for dealing with face images corrupted by illumination variations and expression changes.

4.3 AR database

The AR database[31] includes over 4000 color face images of 126 individuals. For each individual, 26 images were taken in 2 separate sessions. In each session, there are 13 images for each person, among which 3 images are occluded by sunglasses, 3 by scarves, and the remaining 7 with illumination and expression variations (and thus are named as neutral images). All images are cropped to 50 × 40 pixels and converted into gray scale. As done in [10], we choose a subset containing 50 male subjects and 50 female subjects in the experiments. Fig. 1 (c) shows the whole images of one subject in the AR database. Following the scheme in [32], we consider 3 scenarios to evaluate the performance of our method.

Sunglasses: In this scenario, we take into account the situation in which both the training and test samples are corrupted by sunglass occlusion. Sunglasses produce about 20% occlusion of the face image. We use 7 neutral images and 1 image with sunglasses from the first session for training, 7 neutral images from the second session and the remaining 5 images with sunglasses from 2 sessions for testing.

Scarf: In this scenario, we consider corruptions of both the training and test images, because of scarf occlusion. Scarf generates around 40% occlusion of the face image. Similarly, we choose 8 training images consisting of 7 neutral images plus 1 image with scarf from the first session for training, and 12 test images consisting of 7 neutral images from the second session and the remaining 5 images with scarf from 2 sessions for testing.

Mixed (Sunglasses + Scarf): In this scenario, we consider the case in which both the training and test images are occluded by sunglasses and scarf. We select 7 neutral images plus 2 occluded images (one image with sunglasses and another with scarf) from the first session for training and the remaining 17 images in 2 sessions for testing.

In this experiment, we select Eigenface feature dimensions as 50, 100, 150, 200, 300 and 500 for the comparison of recognition rates across different methods. The LRSRC parameters are set as λ = 10, β = 0.09. Fig. 4 gives the recognition results of 3 different scenarios. Table 3 summarizes the best recognition rates of all the methods under 3 different scenarios.

Table 3 The best recognition rates (%) and the corresponding dimensions (in the parentheses) of all the methods on the AR database

Full size table

From Fig. 4, one can see that the proposed method consistently gives the highest recognition rate in most of dimensions under 3 different scenarios. At the same time, one can find from Table 3 that the best recognition rate of our method is highest for each scenario. Specifically, our method performs better than the best competitor SRC with more than 1% improvement of the best recognition rate for the Sunglasses scenario, 3.5% improvement of the best recognition rate for the Scarf and Mixed scenarios. Moreover, one can easily find from Table 3 that our method also exceeds in performance as compared to the other 4 methods under 3 different scenarios. The experiment results demonstrate the effectiveness and robustness of our method for processing occluded face images.

5 Conclusions

In this paper, we present a low-rank sparse representation-based classification method for robust face recognition. LRSRC can efficiently find the lowest-rank and sparsest representation matrix of a set of test samples over all training samples, and then classifies each test sample into the correct class by using its representation vector. Experimental results on Extended Yale B, CMU PIE, and AR databases confirm that our proposed method is effective and robust, even when face images are corrupted by illumination variations, expression changes, disguises, or occlusions. It must be pointed out that our proposed LRSRC mainly performs the classifications based on the entire test set. In our future work, we will investigate how to extend LRSRC to the single test sample scenarios.

References

J. B. Tenenbaum, V. de Silva, J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, vol. 290, no. 5500, pp. 2319–2323, 2000.
Article Google Scholar
S. T. Roweis, L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, vol. 290, no. 5500, pp. 2323–2326, 2000.
Article Google Scholar
M. A. Turk, A. P. Pentland. Face recognition using eigenfaces. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, Maui, USA, pp. 586–591, 1991.
Google Scholar
P. N. Belhumeur, J. P. Hespanha, D. J. kriegman. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711–720, 1997.
Article Google Scholar
X. F. He, S. C. Yan, Y. X. Hu, P. Niyogi, H. J. Zhang. Face recognition using laplacianfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 3, pp. 328–340, 2005.
Article Google Scholar
S. C. Yan, D. Xu, B. Y. Zhang, H. J. Zhang, Q. Yang, S. Lin. Graph embedding and extensions: A general framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 1, pp. 40–51, 2007.
Article Google Scholar
H. T. Chen, H. W. Chang, T. L. Liu. Local discriminant embedding and its variants. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, San Diego, USA, pp. 846–853, 2005.
Google Scholar
G. Q. Wang, L. X. Li, X. B. Guo. Sparsity preserving discriminant embedding for face recognition. Chinese Journal of Scientific Instrument, vol. 35, no. 2, pp. 305–312, 2014. (in Chinese)
Google Scholar
R. O. Duda, P. E. Hart, D. G. Stork. Pattern Classification, 2nd ed., New York, USA: John Wiley & Sons, 2001.
MATH Google Scholar
F. De la Torre, M. J. Black. Robust principal component analysis for computer vision. In Proceedings of the 8th IEEE International Conference on Computer Vision, IEEE, Vancouver, USA, pp. 362–369, 2001.
Google Scholar
Z. Z. Feng, M. Yang, L. Zhang, Y. Liu, D. Zhang. Joint discriminative dimensionality reduction and dictionary learning for face recognition. Pattern Recognition, vol. 46, no. 8, pp. 2134–2143, 2013.
Article Google Scholar
J. Wright, A. Y. Yang, A. Ganesh, S. Sastry, Y. Ma. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2009.
Article Google Scholar
S. S. Chen, D. L. Donoho, M. A. Saunders. Atomic decomposition by basis pursuit. SIAM Review, vol. 43, no. 1, pp. 129–159, 2001.
Article MathSciNet MATH Google Scholar
S. J. Kim, K. Koh, M. Lustig, S. Boyd, D. Gorinevsky. An interior-point method for large-scale l₁-regularized least squares. IEEE Journal of Selected Topics in Signal Processing, vol. 1, no. 4, pp. 606–617, 2007.
Article Google Scholar
J. F. Yang, Y. Zhang. Alternating direction algorithm for l₁-problems in compressive sensing. SIAM Journal on Scientific Computing, vol. 33, no. 1, pp. 250–278, 2011.
Article MathSciNet MATH Google Scholar
W. H. Deng, J. N. Hu, J. Guo. Extended SRC: Undersampled face recognition via intraclass variant dictionary. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 9, pp. 1864–1870, 2012.
Article Google Scholar
L. Zhang, M. Yang, X. Feng. Sparse representation or collaborative representation: which helps face recognition: which helps face recognition? In Proceedings of International Conference on Computer Vision, Barcelona, Spain, pp. 471–478, 2011.
Google Scholar
I. Naseem, R. Togneri, M. Bennamoun. Linear regression for face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 11, pp. 2106–2112, 2010.
Article Google Scholar
G. C. Liu, Z. C. Lin, S. C. Yan, Y. Yu, Y. Ma. Robust recovery of subspace structures by low-rank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 171–184, 2013.
Article Google Scholar
X. Q. Lu, Y. L.Wang, Y. Yuan. Graph-regularized low-rank representation for destriping of hyperspectral images. IEEE Transactions on Geoscience and Remote Sensing, vol. 51, no. 7, pp. 4009–4018, 2013.
Article Google Scholar
L. Ma, C. H. Wang, B. H. Xiao, W. Zhou. Sparse representation for face recognition based on discriminative low-rank dictionary learning. In Proceedings of International Conference on Computer Vision and Pattern Recognition, IEEE, Providence, USA, pp. 2586–2593, 2012.
Google Scholar
Y. M. Z. Zhang, Z. L. Jiang, L. S. Davis. Learning structured low-rank representation for image classification. In Proceedings of International Conference on Computer Vision and Pattern Recognition, IEEE, Portland, USA, pp. 676–683, 2013.
Google Scholar
D. P. Bertsekas. Constrained Optimization and Lagrange Multiplier Methods, New York, USA: Academic Press, 1982.
MATH Google Scholar
Z. C. Lin, R. S. Liu, Z. X. Su. Linearized alternating direction method with adaptive penalty for low-rank representation. In Proceedings of Neural Information Processing Systems, Granada, Spain, pp. 1–9 2011.
Google Scholar
L. S. Zhuang, H. Y. Gao, Z. C. Lin, Y. Ma, X. Zhang, N. H. Yu. Non-negative low rank and sparse graph for semisupervised learning. In Proceedings of International Conference on Computer Vision and Pattern Recognition, IEEE, Providence, USA, pp. 2328–2335, 2012.
Google Scholar
N. Zhang, J. Yang. Low-rank representation based discriminative projection for robust feature extraction. Neurocomputing, vol. 111, pp. 13–20, 2013.
Article Google Scholar
J. F. Cai, E. J. Caneds, Z. W. Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, vol. 20, no. 4, pp. 1956–1982, 2010.
Article MathSciNet MATH Google Scholar
Z. C. Lin, M. M. Chen, Y. Ma. The augmented Lagrange multiplier method for exact recovery of corrupted lowrank matrices, UIUC Technical Report, UILU-ENG-09–2215, University of Illinois at Urbana-Champaign, USA, 2009.
Google Scholar
A. S. Georghiades, P. N. Belhumeur, D. J. Kriegman. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 643–660, 2001.
Article Google Scholar
T. Sim, S. Baker, M. Bsat. The CMU pose, illumination, and expression database. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 12, pp. 1615–1618, 2003.
Article Google Scholar
A. M. Martinez, R. Benavente. The AR face database, CVC Technical Report 24, Purdue University, USA, 1998.
Google Scholar
C. F. Chen, C. P. Wei, Y. C. F. Wang. Low-rank matrix recovery with structural incoherence for robust face recognition. In Proceedings of International Conference on Computer Vision and Pattern Recognition, IEEE, Providence, USA, pp. 2618–2625, 2012.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Image Processing and Pattern Recognition, Henan University, Kaifeng, 475004, China
Hai-Shun Du, Qing-Pu Hu & Dian-Feng Qiao
Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, GR-54006, Greece
Ioannis Pitas

Authors

Hai-Shun Du
View author publications
You can also search for this author in PubMed Google Scholar
Qing-Pu Hu
View author publications
You can also search for this author in PubMed Google Scholar
Dian-Feng Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Pitas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hai-Shun Du.

Additional information

This work was supported by National Natural Science Foundation of China (No. 61374134) and the key Scientific Research Project of Universities in Henan Province, China (No. 15A413009).

Recommended by Associate Editor Jangmyung Lee

Rights and permissions

Reprints and permissions

About this article

Cite this article

Du, HS., Hu, QP., Qiao, DF. et al. Robust face recognition via low-rank sparse representation-based classification. Int. J. Autom. Comput. 12, 579–587 (2015). https://doi.org/10.1007/s11633-015-0901-2

Download citation

Received: 24 October 2014
Accepted: 04 March 2015
Published: 06 November 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s11633-015-0901-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Robust face recognition via low-rank sparse representation-based classification

Abstract

Similar content being viewed by others

Extended sparse representation-based classification method for face recognition

An alternative to face image representation and classification

Discriminative face recognition via kernel sparse representation

1 Introduction

2 Related work