Fast kernel sparse representation based classification for Undersampling problem in face recognition

Fan, Zizhu; Wei, Chao

doi:10.1007/s11042-019-08211-x

Fast kernel sparse representation based classification for Undersampling problem in face recognition

Published: 21 December 2019

Volume 79, pages 7319–7337, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Fast kernel sparse representation based classification for Undersampling problem in face recognition

Download PDF

Zizhu Fan¹ &
Chao Wei¹

297 Accesses
10 Citations
Explore all metrics

Abstract

We propose a fast kernel sparse representation based classification (SRC) for undersampling problem, i.e., each class has very few training samples, in face recognition. The proposed algorithm exploits a nonlinear mapping to map the data from the original input space into a high-dimensional feature space. Then, it performs very fast sparse representation and classification of samples in this space. Similar to the typical SRC methods, the proposed approach is based on the L1 norm minimization, whose direct solution can be very time-consuming. In order to improve the computational efficiency, our method uses the coordinate descent method in the feature space, which can avoid directly solving the L1 norm minimization problem, and significantly expedites the computational procedure. Compared with other SRC methods based on the L1 norm minimization, our proposed method achieves very high computational efficiency, without significantly degrading the classification performance. Several experiments on popular face databases demonstrate that our method is a promising efficient kernel SRC based method.

Discriminative face recognition via kernel sparse representation

Article 18 June 2018

Kernel L1-Minimization: Application to Kernel Sparse Representation Based Classification

Kernel Based Weighted Group Sparse Representation Classifier

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In recent years, the representation based classification (RC) has been a hot research topic in pattern recognition and machine learning community, particularly in face recognition. The main procedure of RC is that for a test sample, the RC method exploits the features extracted from the training samples (or directly uses the training samples) to construct one model that is used to represent this test sample by using one or more types of vector norms, e.g., the L₁ norm. Then, the test sample is classified via the representation results after the constructed model is solved. The RC method has some variants that can be used other applications such as the video analysis [6, 33]. Up to now, researchers have applied several types of vector norms to represent the samples, e.g., L₁, L_1/2, L_2,1 and L₂ [23, 27, 36, 38, 40, 41, 52]. Note that using the L₁ norm minimization can yield a sparse representation for samples. The classification methods based on sparse representation are considered robust to outliers and data corruption, e.g., image occlusion. This type of classification methods is referred to as the sparse representation based classification (SRC) [1, 43, 47]. In principle, the representation result obtained by L_1/2 norm is sparser than that obtained by L₁ norm [41]. Nevertheless, solving the L_1/2 norm minimization problem usually entails iteratively solving the L₁ norm minimization. We know that the L_2,1 norm based classification is robust to outliers through selecting samples to yield grouped sparsity [26, 27]. Although the solution of the L₂ norm minimization is usually faster than that of the L₁ norm minimization, the models based on the L₂ norm cannot produce the sparse solution [52]. Moreover, it is believed that they are not robust to image noise or outliers.

Among the RC methods mentioned above, SRC is one of the most popular sparse representation methods for classification in face recognition. Theoretically, the RC approaches using the L₀ norm minimization (i.e., the representation coefficients contain the fewest non-zero components) [12] can produce the sparsest representation result, which may lead to desirable recognition performance [38]. However, the L₀ norm minimization is essentially an NP-hard problem. Furthermore, if enough training samples are available and the result of representing the test sample using them is sufficiently sparse, the solution of the L₀ norm minimization problem is equal to that of the L₁ norm minimization problem [4, 10, 38]. It has also been demonstrated that the solution of the L₁ norm minimization is more computationally efficient than that of the L₀ norm minimization which is an NP-hard problem. Moreover, L₁ norm minimization is a convex problem which is easier to guarantee optimality. Therefore, the L₀ norm minimization is often replaced by the L₁ norm minimization in many applications involving representation methods [34, 39, 46, 48, 49, 55].

However, RC using the L₁ norm minimization has a serious computational obstacle. When the training set is very small, that is, there are very few training samples in each class, using the L₁ norm minimization may not obtain the sparse representation if the data dimensionality is high. Here, we refer to this problem as the undersampling classification problem. As a result, the typical SRC methods based on the L₁ norm minimization may not achieve the desirable classification effectiveness when dealing with very small-scale training set and high dimensional data, e.g., face image data. In order to address the undersampling classification problem in face recognition, Deng et al. proposed an extended SRC (ESRC) [8] which exploits the training set combining with an auxiliary intraclass variation dictionary [9] to represent the test samples. This method is implemented in the original input space and can perform well in many cases. Nevertheless, ESRC fails to capture the nonlinear information within the data set, which is helpful for the classification.

If the data in the original input space are mapped into a high dimensional feature space, i.e., reproducing kernel Hilbert space (RKHS) [28], by using a nonlinear map, and they are exploited to construct the SRC based model, then the new constructed model can capture well the nonlinear information within the data in principle. Based on this idea, a number of SRC-based algorithms implemented in the transformed feature space are proposed to deal with the nonlinear similarity traits. These algorithms are referred to as the kernel sparse representation for classification (KSRC) [5, 51]. In [15, 16], Gao et al. proposed a kernel sparse representation for classification (KSRC) to capture the nonlinear similarity of features. KSRC can be viewed as an extension of SRC and overcome the drawbacks of the SRC [54]. One of the drawbacks is that SRC cannot classify a testing sample that has the same vector direction as training samples belonging to many classes [53]. To remedy this deficiency, in [19], Jian et al. used multi-objective optimization to develop the class-discriminative kernel sparse representation-based classification, called KSRC 2.0. The multi-objective functions were formulated by the Fisher discrimination criterion. By employing nonlinear KSRC in representing the nonlinearities in the high-dimensional RKHS, A. Shrivastava developed a multiple kernel learning approach which exploits a two-step training procedure to learn the kernel weights and sparse representation coefficients [31]. In order to perform robust multimodal biometrics recognition, S. Shekhar et al. proposed a multimodal sparse representation method and kernelized the algorithm to handle nonlinearity in data [30]. In addition, kernel sparse representation can be used in the one-class classification model for remote sensing image analysis [32]. All the above kernel sparse representation methods are based on the direct solution of the L₁ norm minimization problem.

The main goal of our work is to address the underling problem that is also called the small sample size problem in many image recognition tasks in which the number of training images is few and the dimensionality of images is very high, whereas the deep learning based method [7, 21] that is currently popular approach may fail to perform successfully since it usually requires a very large-scale training set. Besides, our work aims at expediting the computing procedure in the high dimensional feature space and can by very easily implemented. Two contributions are made in this paper. The first contribution is that, in order to capture well the nonlinear information within the data and overcome the drawback of the ESRC performed in the original input space, we reformulate the ESRC algorithm in RKHS and propose a fast kernel ESRC (KESRC) algorithm in this paper. Furthermore, typical SRC and ESRC algorithms are based on the L₁ norm minimization, and hence, their computational cost is usually very high. In our proposed algorithm, we make another contribution by applying a kernel coordinate descent (KCD) approach [37] to implement KESRC. KCD is the kernel version of the coordinate descent method that is simple and efficient [14]. The solution of our proposed approach does not involve the direct L₁ norm minimization. Therefore, the computational cost of our method is much lower than that of the SRC and ESRC algorithms. That is, the proposed algorithm is far faster than typical SRC based algorithms including SRC, ESRC and KSRC, etc. Hereafter, our algorithm is referred to as the fast KESRC (FKESRC) algorithm. Experiments on several popular face data sets demonstrate that our algorithm can achieve very high computational efficiency without significantly degrading the classification effectiveness.

The rest of the paper is organized as follows: In Section 2, we review the traditional SRC and ESRC algorithms. Section 3 describes our FKESRC algorithm. Section 4 gives the details of the experimental results and illustrates the effectiveness of the proposed algorithm. Section 5 offers our conclusions.

2 Review of SRC and ESRC

In this section, we review two typical sparse representation for classification methods: SRC and ESRC. The typical SRC algorithm for image classification, especially face recognition, was originally proposed by Wright et al. [38]. It has attracted much attention in recent years. The typical SRC is introduced as follows.

Assume that the training set is denoted as X = [x₁, x₂, ..., x_N] ∈R^{D × N} where x_i ∈ R^D (i = 1, 2, .., N) denotes the ith training sample. Given a test sample y_t ∈ R^D (t = 1, 2, .., T), the traditional SRC essentially seeks the sparsest solution to y_t = Xα, where α = [a₁, a₂, ..., a_N]^T and a_i is the representation coefficient corresponding to the ith training sample, by solving the following optimization problem

$$ {\hat{\alpha}}_1=\mathrm{argmin}{\left\Vert \alpha \right\Vert}_1\;\mathrm{s}.\mathrm{t}.{y}_t= X\alpha $$

(1)

where ‖α‖₁ denotes the L₁-norm of the coefficient vector α. Note that the above SRC model can achieve desirable classification effectiveness under the condition that sufficient training samples are available [38]. When the data dimensionality is very high and the number of the training samples is small, SRC may not perform well. As mentioned above, ESRC proposed by Deng et al. [8] can overcome the drawback of typical SRC algorithms that such algorithms cannot properly handle the learning setting in which the number of the training samples in each class is small. In ESRC, Deng introduced an intraclass variation matrix combining the training samples to represent the test samples. Specifically, the intraclass variation matrix is defined as [8]

$$ {A}_I=\left[{A}_1-{m}_1{I}_1,...,{A}_c-{m}_c{I}_c\right]\in {R}^{D\times N} $$

(2)

where c is the number of sample classes, m_i is the center of the ith class(i = 1,2,…,c), A_i denotes the sample matrix of the ith class in which each column represents one training sample, and $ {I}_j=\left[1,...,1\right]\in {R}^{1\times {n}_j} $where n_j is the number of the training samples in the jth class. For a test sample y_t ∈ R^D, ESRC represents this sample by using the intraclass variation matrix A_I and the training set as follows

$$ {y}_t= X\eta +{A}_I\beta +e $$

(3)

where η and β are the representation coefficients respectively associated with X and A_I, and e denotes the small dense noise. Then, the ESRC model aims to solve the following problem

$$ \left[\begin{array}{c}{\hat{\eta}}_1\\ {}{\hat{\beta}}_1\end{array}\right]=\mathrm{argmin}{\left\Vert \left[\begin{array}{c}\eta \\ {}\beta \end{array}\right]\right\Vert}_1,s.t.{\left\Vert \Big[X,{A}_I\Big]\left[\begin{array}{c}\eta \\ {}\beta \end{array}\right]-{y}_t\right\Vert}_2\le \varepsilon $$

(4)

where ε is the error tolerance, which is a small positive constant.

3 Fast kernel ESRC(FKESRC)

Motivated by the basic idea of the KSRC algorithm, we improve the ESRC algorithm by employing the kernel trick and propose a fast kernel ESRC (FKESRC) algorithm in this section. In Subsection 3.1, we first construct the kernel ESRC (KESRC) model. Then, we give the fast algorithm of KESRC in Subsection 3.2. And finally, the time complexity is analyzed in Subsection 3.3.

3.1 Kernel ESRC(KESRC) model

We apply a nonlinear mapping ϕ(•) to map the samples from the original input space into the high-dimensional feature space (RKHS) in which the data dimensionality is M (M> > D). The total training samples in the original input space are mapped to be ϕ(X) = [ϕ(x₁), ϕ(x₂), ..., ϕ(x_N)] ∈ R^{M × N}, the testing sample in the original space y_t is mapped to be φ(y_t) ∈ R^M, and the center of the ith class in the feature space is

$$ \phi \left({m}_i\right)=\frac{1}{n_i}\sum \limits_{x_j\in class\;i}\phi \left({x}_j\right) $$

(5)

where n_i (i = 1,2,…,c) denotes the number of the training samples in the ith class. Similarly, the intraclass variation matrix in this space ϕ(A_I) can be computed by

$$ \phi \left({A}_I\right)=\left[\phi \left({x}_1^{\hbox{'}}\right),\phi \left({x}_2^{\hbox{'}}\right),...,\phi \left({x}_N^{\hbox{'}}\right)\right] $$

(6)

where $ \phi \left({x}_j^{\hbox{'}}\right)=\phi \left({x}_j\right)-\phi \left({m}_i\right) $(j = 1,2,...,N), and ϕ(m_i) denotes the center of the class to which the sample ϕ(x_j) belongs in the transformed space. Then, the testing sample φ(y_t) can be represented by the training samples combining the intraclass variation matrix ϕ(A_I) as follows

$$ \varphi \left({y}_t\right)=\varphi (X)\theta +\varphi \left({A}_I\right)\omega +\delta $$

(7)

where θ=[θ₁, ..., θ_N]^T and ω= [ω₁, ..., ω_N]^T are the representation coefficients respectively associated with ϕ(X) and ϕ(A_I), and δ denotes the small dense noise. Thus, the kernel ESRC model can be reformulated as

$$ \underset{\theta, \omega }{\min }{\left\Vert \begin{array}{c}\theta \\ {}\omega \end{array}\right\Vert}_1,s.t.{\left\Vert \phi \left({y}_t\right)-\phi (X)\theta -\phi \left({A}_I\right)\omega \right\Vert}_2\le \xi $$

(8)

where ξ is a small positive constant.

The above Eq. (8) is equivalent to the following minimization problem

$$ \left[\begin{array}{c}\hat{\theta}\\ {}\hat{\omega}\end{array}\right]=\underset{\theta, \omega }{\mathrm{argmin}}J\left(\theta, \omega \right)=\underset{\theta, \omega }{\mathrm{argmin}}\left(\frac{1}{2}{\left\Vert \phi \left({y}_t\right)-\phi (X)\theta -\phi \left({A}_I\right)\omega \right\Vert}_2^2\right)+\lambda {\left\Vert \begin{array}{c}\theta \\ {}\omega \end{array}\right\Vert}_1 $$

(9)

where λ is the tradeoff parameter that balances the reconstruction error and the sparseness of the coefficient vector $ \left[\begin{array}{c}\theta \\ {}\omega \end{array}\right] $. Substituting ϕ(X) = [ϕ(x₁), ϕ(x₂), ..., ϕ(x_N)]and ϕ(A_I)= $ \left[\phi \left({x}_1^{\hbox{'}}\right),\phi \left({x}_2^{\hbox{'}}\right),...,\phi \left({x}_N^{\hbox{'}}\right)\right] $ into Eq. (9), we obtain

$$ \underset{\theta, \omega }{\min }J\left(\theta, \omega \right)=\underset{\theta, \omega }{\min}\left(\frac{1}{2}{\left\Vert \varphi \left({y}_t\right)-\sum \limits_{i=1}^N\left({\theta}_i\varphi \left({x}_i\right)+{\omega}_i\varphi \left({x}_i^{\hbox{'}}\right)\right)\right\Vert}_2^2\right)+\lambda {\left\Vert \begin{array}{c}\theta \\ {}\omega \end{array}\right\Vert}_1 $$

(10)

where ϕ(•) is an implicit nonlinear mapping that can be specified by a kernel function. Here, we employ the Gaussian kernel function to perform the nonlinear mapping. Thus, ϕ(x_i)^Tϕ(x_i)=1, and $ \phi {\left({x}_i^{\hbox{'}}\right)}^T\phi \left({x}_i^{\hbox{'}}\right) $=1, (i = 1, 2, ..., N). Given two arbitrary samples ϕ(x) and ϕ(y), the Gaussian kernel function of these two samples is k(x, y)=ϕ(x)^Tϕ(y)=exp(−‖x − y‖²/σ²) where σ is the Gaussian kernel parameter that is often manually tuned in the real applications, andk(x, x)=1. Therefore, we can reformulate Eq.(10) as

$$ \underset{\theta, \omega }{\min }J\left(\theta, \omega \right)=\underset{\theta, \omega }{\min}\left(-2k\left(\bullet, {y}_t\right)\theta -2k\left(\circ, {y}_t\right)\omega +{\theta}^T K\theta +{\omega}^T G\omega +2{\theta}^T{K}^{\ast}\omega +\lambda {\left\Vert \begin{array}{c}\theta \\ {}\omega \end{array}\right\Vert}_1\right) $$

(11)

where k(•, y_t)=(k(x₁, y_t), k(x₂, y_t), ..., k(x_N, y_t)), k(∘, y_t)= $ \left(k\left({x}_1^{\hbox{'}},{y}_t\right),k\left({x}_2^{\hbox{'}},{y}_t\right),...,k\left({x}_N^{\hbox{'}},{y}_t\right)\right) $, $ K=\left(\begin{array}{ccc}k\left({x}_1,{x}_1\right)& \cdots & k\left({x}_1,{x}_N\right)\\ {}\vdots & \ddots & \vdots \\ {}k\left({x}_N,{x}_1\right)& \cdots & k\left({x}_N,{x}_N\right)\end{array}\right),\mathrm{G}\left(\begin{array}{ccc}k\left({x}_1^{\hbox{'}},{x}_1^{\hbox{'}}\right)& \cdots & k\left({x}_1^{\hbox{'}},{x}_N^{\hbox{'}}\right)\\ {}\vdots & \ddots & \vdots \\ {}k\left({x}_N^{\hbox{'}},{x}_1^{\hbox{'}}\right)& \cdots & k\left({x}_N^{\hbox{'}},{x}_N^{\hbox{'}}\right)\end{array}\right) $, and $ {K}^{\ast }=\left(\begin{array}{ccc}k\left({x}_1,{x}_1^{\hbox{'}}\right)& \cdots & k\left({x}_1,{x}_N^{\hbox{'}}\right)\\ {}\vdots & \ddots & \vdots \\ {}k\left({x}_N,{x}_1^{\hbox{'}}\right)& \cdots & k\left({x}_N,{x}_N^{\hbox{'}}\right)\end{array}\right) $_.

Similar to the other typical SRC based models, our KESRC model is also based on the L₁norm minimization, and its traditional solution is time-consuming. Next, we will introduce a new approach to solve the proposed KESRC model by using the coordinate descent approach [14].

3.2 Fast KESRC

In this subsection, we exploit the coordinate descent approach to solve the KESRC model. In the coordinate descent method, we do not need to compute the derivative of the objective function. The coordinate descent performs each coordinate-wise minimization with an explicit formula and exploits the sparsity of the model. It focuses mainly on evaluating only inner products for variables with non-zero coefficients [14]. The computational efficiency of the coordinate descent method is very high. This method can largely improve the computational efficiency of KESRC in theory. Hence, our proposed method is a fast KESRC (FKESRC) algorithm. In FKESRC, we need to compute two parameter vectors θ and ω which are associated with the training samples and the columns of the intraclass variation matrix, respectively. First, we determine the parameter vector θ based on the efficient coordinate descent method. Note that the objective function J(θ, ω) in Eq. (11) can be rewritten as follows

$$ {\displaystyle \begin{array}{c}J\left(\theta, \omega \right)=-2\sum \limits_{i=1}^N{\theta}_ik\left({x}_i,{y}_t\right)-2\sum \limits_{i=1}^N{\omega}_ik\left({x}_i^{\hbox{'}},{y}_t\right)\\ {}+\sum \limits_{i=1}^N\sum \limits_{j=1}^N{\theta}_i{\theta}_jk\left({x}_i,{x}_j\right)+2\sum \limits_{i=1}^N\sum \limits_{j=1}^N{\theta}_i{\omega}_jk\left({x}_j^{\hbox{'}},{x}_i\right)+\sum \limits_{i=1}^N\sum \limits_{j=1}^N{\omega}_i{\omega}_jk\left({x}_i^{\hbox{'}},{x}_j^{\hbox{'}}\right)+\lambda {\left\Vert \begin{array}{c}\theta \\ {}\omega \end{array}\right\Vert}_1\end{array}} $$

(12)

We take the partial derivative of J(θ, ω) with respect to θ_i(i = 1, 2, ..., N), and have

$$ \frac{\partial J\left(\theta, w\right)}{\partial {\theta}_i}+-2k\left({x}_i,{y}_t\right)+2\sum \limits_{j=1}^N{\theta}_jk\left({x}_i,{x}_j\right)+2\sum \limits_{j=1}^N{\omega}_jk\left({x}_i,{x}_j^{\hbox{'}}\right)+\lambda \operatorname{sgn}\left({\theta}_i\right) $$

(13)

We set $ \frac{\partial J\left(\theta, w\right)}{\partial {\theta}_i} $ to 0, then,

$$ {\theta}_i=k\left({x}_i,{y}_t\right)-\sum \limits_{j=1,j\ne i}^N{\theta}_jk\left({x}_i,{x}_j\right)-\sum \limits_{j=1}^N{\omega}_jk\left({x}_i,{x}_j^{\hbox{'}}\right)-\frac{\lambda }{2}\operatorname{sgn}\left({\theta}_i\right) $$

(14)

Let

$$ {e}_{\theta}\left({x}_i\right)=k\left({x}_i,{y}_t\right)-\sum \limits_{j=1,j\ne i}^N{\theta}_jk\left({x}_i,{x}_j\right)-\sum \limits_{j=1}^N{\omega}_jk\left({x}_i,{x}_j^{\hbox{'}}\right) $$

(15)

Thus, the updating of the coefficient θ_i is independent of the other coefficients θ_j(j = 1, 2, ..., N, j ≠ i) and ω_j(j = 1, 2, ..., N) which are fixed in the calculation of θ_i. The coefficient θ_i is computed as

$$ {\theta}_i=\operatorname{sgn}\left({e}_{\theta}\left({x}_i\right)\right){\left[{e}_{\theta}\left({x}_i\right)|-\frac{\lambda }{2}\right]}_{+} $$

(16)

where the function [⋅]₊ is defined as

$$ {\left[h\right]}_{+}=\Big\{{\displaystyle \begin{array}{c}h,\kern0.36em h>0\\ {}0,\kern0.36em h\le 0\end{array}} $$

(17)

Similarly, we take the partial derivative of J(θ, ω) with respect to ω_i(i = 1, 2, ..., N), and have

$$ \frac{\partial J\left(\theta, w\right)}{\partial {\omega}_i}=-2k\left({x}_i^{\hbox{'}},{y}_t\right)+2\sum \limits_{j=1}^N{\theta}_jk\left({x}_j,{x}_i^{\hbox{'}}\right)+2\sum \limits_{j=1}^N{\omega}_jk\left({x}_i^{\hbox{'}},{x}_j^{\hbox{'}}\right)+\lambda \operatorname{sgn}\left({\omega}_i\right) $$

(18)

We set $ \frac{\partial J\left(\theta, w\right)}{\partial {\omega}_i} $ to 0, then

$$ {\omega}_i=k\left({x}_i^{\hbox{'}},{y}_t\right)-\sum \limits_{j=1,j\ne i}^N{\omega}_jk\left({x}_i^{\hbox{'}},{x}_j^{\hbox{'}}\right)-\sum \limits_{j=1}^N{\theta}_jk\left({x}_j,{x}_i^{\hbox{'}}\right)-\frac{\lambda }{2}\operatorname{sgn}\left({\omega}_i\right) $$

(19)

Let

$$ {e}_{\omega}\left({x}_i\right)=k\left({x}_i^{\hbox{'}},{y}_t\right)-\sum \limits_{j=1,j\ne i}^N{\omega}_jk\left({x}_i^{\hbox{'}},{x}_j^{\hbox{'}}\right)-\sum \limits_{j=1}^N{\theta}_jk\left({x}_j,{x}_i^{\hbox{'}}\right) $$

(20)

Also, the updating of the coefficient ω_i is independent of the other coefficients ω_j(j = 1, 2, ..., N, j ≠ i) and θ_j(j = 1, 2, ..., N) which are fixed in the calculation of ω_i. The coefficient ω_i is computed as

$$ {\omega}_i=\operatorname{sgn}\left({e}_{\omega}\left({x}_i\right)\right){\left[{e}_{\omega}\left({x}_i\right)|-\frac{\lambda }{2}\right]}_{+} $$

(21)

where the function [⋅]₊ is defined in Eq. (17).

After obtaining the coefficients θ_iand ω_i(i = 1, 2, ..., N), we can employ them to classify the testing sample y_t. Then, we need to compute the following residuals

$$ {r}_k\left({y}_t\right)={\left\Vert \phi \left({y}_t\right)-\Big[\phi (X),\phi \left({A}_I\right)\Big]\left[\begin{array}{c}{\delta}_k\left(\theta \right)\\ {}\omega \end{array}\right]\right\Vert}_2^2,\left(k=1,2,...,c\right) $$

(22)

where δ_k(θ) ∈ R^N is a vector whose only nonzero entries are the entries in θ associated with class k. Then, Eq. (22) can be rewritten as

$$ {r}_k\left({y}_t\right)=k\left({y}_t,{y}_t\right)-2k\left(\bullet, {y}_t\right){\delta}_k\left(\theta \right)-2k\left(\circ, {y}_t\right)\omega +{\delta}_k{\left(\theta \right)}^TK{\delta}_k\left(\theta \right)+{\omega}^T G\omega +2{\delta}_k{\left(\theta \right)}^T{K}^{\ast}\omega $$

(23)

Hence, the class label of y_t is determined by

$$ Identity\left({y}_t\right)=\arg \underset{k}{\min }{r}_k\left({y}_t\right),\left(k=1,2,...,c\right), $$

(24)

In our algorithm, the coefficient vectors θ and ω are updated iteratively by Eq. (16) and (21). The initialization of vectors θ and ω are computed through solving the following equation φ(y_t) = ϕ(X)θ + ϕ(A_I)ω by kernel trick, and we obtain

$$ {\left[\begin{array}{c}\theta \\ {}\omega \end{array}\right]}_{in}={\left({P}^TP+\mu I\right)}^{-1}{P}^Tk\left(\bullet, {y}_t\right)= Sk\left(\bullet, {y}_t\right) $$

(25)

where P = [K K^∗]_,S=(P^TP + μI)⁻¹P^T, μis a small positive value, and I is the identity matrix. The following Algorithm 1 summarizes our FKESRC.

Algorithm 1. Fast KESRC algorithm

In the above algorithm, the number of iterations m is usually a small integer, say, 3 to 10, and can easily lead to a sparse solution. In our experiments, m is set to 3.

3.3 Time complexity

In this subsection, we provide an analysis of the time complexity of our algorithm. The analysis contains three main parts. The first part is the preprocessing or initialization procedure. This procedure needs to compute the matrices K, G and K^∗ in Eq.(11), and (P^TP + μI)⁻¹P^T in Eq. (25). The time complexity of this procedure is about O(N²(3D + 16N)) where N is the number of the training samples and D is the data dimensionality. The second part is to represent the testing samples. For each testing sample, the time complexity of representing this sample is O(mN²) where the number of iterations m is usually a relatively small integer (m = 3 in this work). Hence, the total time complexity of this part is O(mN²T) where T is the total number of the testing samples, which is often far larger than N. The third part is calculating the residuals and classifying each testing sample. The time complexity of this procedure is roughly O(N²). Totally, the time complexity of the third part is about O(N²T). Note that the typical SRC algorithm [38] contains two principal procedures: the representing and classifying testing sample procedures. The time complexities of the representing and classifying the total testing samples is O(N^eD²T) where e > = 1.2, and O(NDT), respectively [13, 44].

From the above analysis, we know that if the number of testing samples is sufficiently large, the second part, i.e., the representing part whose time complexity is not related to the data dimensionality, would be the most time-consuming procedure in our algorithm. Also, the representing procedure of the typical SRC algorithm consumes the most time in this algorithm implementation. Nevertheless, the time complexity of representing procedure of the typical SRC algorithm is tightly related to the data dimensionality. Therefore, compared with the typical SRC algorithm, our algorithm is computationally efficient if the number of testing samples is large and the data dimensionality is high.

4 Experiments

This section will verify the computational efficiency and recognition effectiveness of our proposed FKESRC algorithm through conducting several experiments on four popular real-world face databases. The face databases are the GT, AR, YaleB and LFW (the labeled faces in the wild) databases [3, 25, 35]. For comparison, we have implemented the other state-of-the-art representation based classification algorithms. They are the collaborative representation for classification (CRC) [52] that is based on the L₂ norm minimization, and typical classification algorithms based on the L₁ norm minimization, namely SRC [38], SRC performed by using the coordinate descent scheme (SCD), KSRC [53] and ESRC [8] and its kernel version, i.e., kernel ESRC (KESRC). The sparse representation procedures of these algorithms are implemented by using l1_ls modular [20, 42] which produces high-quality classification performance, by solving the L₁ norm minimization problem. Similar to ESRC, our proposed algorithm can be viewed as a dictionary learning method. For comparison, here we have implemented two recently proposed dictionary learning methods. One is the kernel extended dictionary (KED) [18] and the other one is the locality-constrained and label embedding dictionary learning (LCLEDL) [22] method. In the experiments, the kernel function in the KSRC, KESRC and our proposed FKESRC is the Gaussian kernel k(x, y)=exp(−‖x − y‖²/2σ²), where σ is the Gaussian kernel parameter which is set to be r × d, where d is the average Euclidean distance of the training samples and r, referred to as the kernel ratio, needs to be specified on each data set. The optimal error tolerance ε in Eq. (4) is set to 1e-3. Since the SRC, KSRC, ESRC and KESRC algorithms are very time-consuming, the dimensionality of all face images in these algorithms is reduced by using principal component analysis (PCA) in all experiments to save the computational time. For a fair comparison, the image dimensionality in our proposed algorithm and other algorithms is also reduced by PCA. All the experiments are run on a platform with 3.4 GHz CPU and 8.0 GB RAM using Matlab R2016b software.

4.1 Experiments on the GT database

The first experiment was conducted on the GT (Georgia Tech) face database which was built at Georgia Institute of Technology. This database contained 50 individuals with 15 color images per individual. The face images of each individual characterized several variations such as pose, expression, and illumination [25]. Figure 1 gives some samples of this database. All the images were first converted to greyscale images, and then cropped and resized to a resolution of 60 × 50 pixels. We randomly grouped the image samples of each subject into two parts, i.e., the training and testing parts. For each individual, a few images (N = 3, 4, 5 and 6) were randomly selected for training, and the rest were used for testing.

The face image data dimensionality was reduced to 40 by using PCA in this experiment. In the KSRC algorithm, the Gaussian kernel parameter in the kernel function was set to 0.1 × d. That is, the kernel ratio was 0.1 that is same as that of KESRC in all experiments. The kernel ratio in our FKESRC algorithm was set to 0.8. And the parameter λ in Eqs. (16) and (21) was set to 0.01. We randomly ran each algorithm 10 times on each training subset. Table 1 gives the recognition rates (MEAN ± STD-DEV PERCENT) on four training subsets (denoted by N = 3, 4, 5 and 6 in Table 1). From this table, we can see that when N = 4 and 5, FKESRC obtained the highest recognition rates among all the algorithms except for KESRC. Compared with other algorithms, our FKESRC can achieve desirable classification performance.

Table 1 Recognition rates on the GT database

Full size table

Our method is one typical SRC based algorithm, and the sparse model of FKESRC is resolved by using the L₁ norm minimization procedure. Also, the sparse model of SRC, SCD, KSRC, ESRC and KESRC are resolved by using the L₁ norm minimization procedure. Therefore, for fair comparison, we have compared them with our method in terms of the computational time. For each of these algorithms including our FKESRC, we denoted the total computational time as C_t. Then the average computational time was calculated as C_t/10 (randomly ran each algorithm 10 times and average, as mentioned above). Table 2 reports the average computational time of six algorithms (SRC, SCD, KSRC, ESRC, KESRC and our FKESRC). Note that the ESRC and KESRC algorithms are slower than the SRC and KSRC algorithms. The reason was that the ESRC and KESRC models involved the intraclass variation matrix, and consequently, directly solving the ESRC and KESRC models had to spend more CPU time. In fact, our FKESRC model also used the intraclass variation matrix. However, our algorithm is much more computationally efficient than the ESRC and KESRC algorithms since our algorithm exploited the coordinate descent approach. Since our method needs to compute the intraclass variation matrix, the proposed method is slightly slower than the SCD algorithm that does not compute the intraclass variation matrix. However, the recognition rates of our method are higher than those of SCD. From Table 2, we can observe that the computational time of our approach is only 0.6% of that of KESRC when N = 3. And when N = 4, the ratio of the computational time of FKESRC to that of KESRC is only 0.5%. From Tables 1 and 2, we can observe that the proposed FKESRC achieved desirable recognition results which being the second fastest algorithm among all tested algorithms.

Table 2 Computational time (s) on the GT data set

Full size table

4.2 Experiment on the AR database

The second experiment was conducted on the AR data set [29], which contained over 4000 face images of 126 individuals. Each individual included frontal views of faces with different facial expressions, lighting conditions, and occlusions [11, 45]. Figure 2 shows some examples of a test subject in the AR database. We used the face images of 120 people and each person with 26 images. All the images were first changed to greyscale images, and then cropped and resized to a resolution of 20 × 25 pixels. Again, we randomly grouped the image samples of each individual into two parts. One part was used for training and the other part for testing. The numbers of training images that were chosen for each person were also 3, 4, 5 and 6 like before (N = 3, 4, 5 and 6).

The face image data dimensionality was reduced to 300 by using PCA in this experiment. Similarly, we needed to set the parameters in KSRC and our FKESRC. In the KSRC algorithm, the kernel ratio was 1e-3. The kernel ratio in our FKESRC algorithm was set to 1.5 by experimentation. And the parameter λ in Eqs. (16) and (21) was also set to 0.01. Also, we randomly ran each algorithm 10 times on each subset. Figure 3 depicts the recognition rates of all algorithms on the AR database. In this figure, N denoted the number of training images of each individual. From this figure, we can see that our proposed algorithm achieved comparable recognition results with other algorithms. Most importantly, our approach was much more computationally efficient than SRC, KSRC, ESRC and KESRC approaches, except for the SCD algorithm. The computational time of these algorithms was reported in Table 3. As we expected, the ratio of computational time of our algorithm to that of other algorithms was still very small. For example, the ratio of the computational time of our algorithm to that of the KESRC algorithm was only 1.1% when N = 3. Compared with the SRC algorithm, our approach was approximately 40 times faster. From Table 3, we can find that FKESRC was the fastest algorithm among all tested. On the other hand, the average time of representing and classifying each testing sample was only 0.01 s in our approach when N = 3, and when N = 6, the average time was not more than 0.03 s, while the ESRC algorithm needed nearly 1 s for the same job. Therefore, our method can be applied in the real time recognition applications on the AR data set. The main reason of the very high computational efficiency of our proposed approach and the SCD algorithm was that these algorithms avoided directly solving the L₁ norm minimization problem, whereas the other algorithms in our experiments need the direct solution of the L₁ norm minimization problem which was very time consuming, as demonstrated in the experiments.

Table 3 Computational time (s) on the AR data set

Full size table

4.3 Experiment on the YaleB database

We conducted the third experiment on the YaleB face data set which contained 10 subjects with 5850 face images. Each individual had 585 face images [2]. The YaleB data set was a widely used illumination data set with images spanning a large range of possible illuminations [3]. Figure 4 shows some image examples of a subject in the YaleB database. The goal of this experiment is to investigate the performance of our algorithm on different dimensions.

In the YaleB data set, each image was manually aligned and cropped, and the size of each image was 30 × 40. Also, we randomly grouped the image samples of each individual into two parts for training and testing. The number of training images for each individual was 10. The face image data dimensionalities were reduced to 40, 50, 60, 70 and 80, respectively, by using PCA in this experiment. Thus, we obtained five training subsets.

In this experiment, the kernel ratio of the KSRC algorithm was 0.1. The kernel ratio in our FKESRC algorithm was set to 0.9. And the parameter λ in Eqs. (16) and (21) was again set to 0.01. Also, we randomly ran each algorithm 10 times on each subset. Figure 5 shows the recognition rates of all algorithms on the YaleB database. In this figure, D denoted the dimensionality of face images of each individual. From this figure, we can see our proposed algorithm achieved the desirable recognition results, and had comparable performance with other algorithms in terms of the recognition rates.

Similar to the first two experiments, we payed more attention on the computational efficiency of six SRC based algorithms. Table 4 reports the computational time of these algorithms. From this table, we can conclude that ESRC and KESRC were far slower than other algorithms since they used the intraclass variation matrix and directly solved the L₁ norm minimization problem, as mentioned in the previous experiment section. Note that although our FKESRC model also involved the intraclass variation matrix, our method was far faster than ESRC and KESRC, since our approach avoided directly solving the L₁ norm minimization problem. According to Table 4, the ratio of computational time of our algorithm to that of the KESRC algorithm was only 0.8% when D = 80. On the other hand, although the SRC and KSRC algorithms did not involve the intraclass variation matrix, both of them were far slower than our approach. Our approach was about 50 times faster than the KSRC algorithm that was faster than the SRC, ESRC and KESRC methods. From Table 4, we can observe that the average time of representing and classifying each testing sample was only about 0.002 s in our approach. Therefore, our method should be more suitable for use in real time recognition applications than other algorithms on the YaleB database.

Table 4 Computational time (s) on the YaleB data set

Full size table

4.4 Experiment on the LFW database

The fourth experiment was conducted on the LFW (the labeled face in the wild) database in which all face images were collected from the web [13]. This data set contains over 13,000 face images. Figure 6 gives several face image samples of one individual in the LFW database. We used the face images of 86 individuals and each person with 11 images. All the images were first changed to greyscale images, and then cropped and resized to a resolution of 32 × 32 pixels. Similarly, we randomly grouped the image samples of each individual into two parts. One part was used for training and the other part for testing. For each individual, a few images (N = 7 and 9) were randomly selected for training, and the rest were used for testing.

The face image data dimensionality was reduced to 200 by using PCA in this experiment. In the KSRC algorithm, the kernel ratio was 1e-3. The kernel ratio in our FKESRC algorithm was set to 0.5. Also, we randomly ran each algorithm 10 times on each training subset. Table 5 reports the recognition rates of all algorithms. Form this table, we can observe that our approach achieves comparable classification performance with the KSRC, ESRC and KESRC algorithms, and outperforms CRC, SRC, SCD, LCLEDL and KED methods.

Table 5 Recognition rates on the LFW database

Full size table

In this experiment, we report the computational time of SRC, SCD, KSRC, ESRC, KESRC and our method in Table 6. According to this table, our method is the second fastest one among these six algorithms. Our method is a variant of KESRC and ESRC. The proposed FKESRC is far faster than these two algorithms. From Table 6, we can see that FKESRC is about 60 times faster than KESRC, although KESRC is slightly better than our method in terms of recognition rates. On the LFW database, the average time of representing and classifying each testing sample was only about 0.02 s in our approach when N = 7. Hence, we can make a conclusion that FKESRC is a real-time sparse representation based classification approach.

Table 6 Computational time (s) on the LFW data set

Full size table

4.5 Determining the kernel ratio for FKESRC

In this subsection, we introduce how to determine the kernel ratio parameter used in our FKESRC algorithm on four face databases. We chose the optimal kernel ratio from the set Z = {0.01, 0.02, ..., 10}, termed kernel ratio set, that leads to the best recognition result on each database. In order to illustrate the selection procedure, we first chose one subset from each database. Similar to the previous experiments, this subset includes one training subset and one testing subset. Specifically, for the GT data set, the number of training images of a person is three, i.e., N = 3. Similarly, for the AR data set, N = 3; for the YelaB data set, N = 10; and for the LFW data set, N = 7. Then, we randomly ran FKESRC using each kernel ratio in Z one time on the chosen subsets. The recognition results obtained by different kernel ratios are shown in the following Fig. 7. In this figure, the value of abscissa denotes the kernel ratio value. The value of ordinate denotes the recognition rate. From this figure, we can observe that when the kernel ratio is near to 1, our FKESRC tends to achieve higher recognition rates. Hence, it is easy to determine the suitable kernel ratio on all databases.

5 Conclusion

In this work, we proposed a fast kernel sparse representation based classification, i.e., the FKESRC algorithm, which aims to solve the undersampling problem in face recognition. Theoretically, our approach can address well the undersampling problem. FKESRC can also capture the nonlinear information within the data set, and achieve the desirable classification effectiveness, which is demonstrated by the experiments in Section 4. Moreover, it is a computationally efficient sparse representation based classification. Although our algorithm and the typical SRC, KSRC, ESRC and KESRC algorithms are all based on the L₁ norm minimization problem, FKESRC does not need to directly solve the L₁ norm minimization problem, while the other algorithms do. Among all algorithms, our proposed method is a good trade-off between the recognition rate and the computational time. It can be applied in the real time classification applications such as face recognition. Besides, the basic idea of our method can be potentially exploited in lung prediction [50], background subtraction [24] and saliency detection [17].

References

Adamo A, Grossi G, Lanzarotti R, Lin J (2015) Robust face recognition using sparse representation in LDA space. Mach Vis Appl 26(6):837–847
Google Scholar
Basri R, Hassner T, Zelnik-Manor L (2010) Approximate nearest subspace search. IEEE Trans Pattern Anal Mach Intell 33(2):266–278
Google Scholar
Beveridge JR, Draper BA, Chang JM, Kirby M et al (2009) Principal angles separate subject illumination spaces in YDB and CMU-PIE. IEEE Trans Pattern Anal Mach Intell 31(2):351–356
Google Scholar
Candes EJ, Tao T (2005) Decoding by linear programming. IEEE Trans Inf Theory 51(12):4203–4215
MathSciNet MATH Google Scholar
Chen Z, Zuo W, Hu Q, Lin L (2015) Kernel sparse representation for time series classification. Inf Sci 292:15–26
MathSciNet MATH Google Scholar
J. Chen, X. Song, L. Nie, X. Wang, et al.(2016) Micro tells macro: predicting the popularity of micro-videos via a Transductive model, presented at the proceedings of the 24th ACM international conference on multimedia, Amsterdam, The Netherlands
Choe C, Choe G, Wang T, Han S, et al (2019) "Deep feature learning with mixed distance maximization for person re-identification," Multimedia Tools and Applications, June 26
Deng W, Hu J, Guo J (2012) Extended SRC: Undersampled face recognition via intraclass variant dictionary. IEEE Trans Pattern Anal Mach Intell 34(9):1864–1870
Google Scholar
Deng W, Hu J, Guo J, (2013) In defense of sparsity based face recognition," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013 ), pp. 399–406
Donoho DL (2006) For most large underdetermined systems of linear equations the minimal minimal l1-norm solution is also the sparsest solution. Commun Pure Appl Math 59(6):797–829
MathSciNet MATH Google Scholar
Fan Z, Wang J, Zhu Q, Fang X et al (2013) Local minimum squared error for face and handwritten character recognition. Journal of Electronic Imaging 22(3):033027
Google Scholar
Fan Z, Ni M, Zhu Q, Sun C et al (2015)L0-norm sparse representation based on modified genetic algorithm for face recognition. J Vis Commun Image Represent 28:15–20
Google Scholar
Fan Z, Ni M, Zhu Q, Liu E (2015) Weighted sparse representation for face recognition. Neurocomputing 151:304–309
Google Scholar
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22
Google Scholar
Gao S, Tsang IW-H, Chia L-T(2013) Sparse representation with kernels. IEEE Trans Image Process 22(2):423–434
MathSciNet MATH Google Scholar
Gao S, Chia L, Tsang I, Ren Z (2014) Concurrent single-label image classification and annotation via efficient multi-layer group sparse coding. IEEE Transactions on multimedia 16(3):762–771
Google Scholar
Han J, Zhang D, Hu X, Guo L et al (2015) Background prior-based salient object detection via deep reconstruction residual. IEEE Transactions on Circuits and Systems for Video Technology 25(8):1309–1321
Google Scholar
Huang KK, Dai DQ, Ren CX, Lai ZR (2017) Learning kernel extended dictionary for face recognition. IEEE Transactions on Neural Networks and Learning Systems 28(5):1082–1094
Google Scholar
Jian M, Jung C (2013)Class-discriminative kernel sparse representation-based classification using multi-objective optimization. IEEE Trans Signal Process 61(18):4416–4427
MathSciNet MATH Google Scholar
Kim S-J, Koh K, Lustig M, Boyd S et al (2007) An interior-point method for large-scale l 1-regularized least squares. IEEE Journal of Selected Topics in Signal Processing 1(4):606–617
Google Scholar
Kushwaha N, Pant M, (2018) "Textual data dimensionality reduction - a deep learning approach," Multimedia Tools and Applications, December 15
Li Z, Lai Z, Xu Y, Yang J et al (2017) A locality-constrained and label embedding dictionary learning algorithm for image classification. IEEE Transactions on Neural Networks and Learning Systems 28(2):278–293
MathSciNet Google Scholar
Liu W, Yu Z, Lu L, Wen Y et al (2015) KCRC-LCD: discriminative kernel collaborative representation with locality constrained dictionary for visual categorization. Pattern Recogn 48(10):3076–3092
Google Scholar
Liu X, Zhao G, Yao J, Qi C (2015) Background subtraction based on low-rank and structured sparse decomposition. IEEE Trans Image Process 24(8):2502–2514
MathSciNet MATH Google Scholar
Naseem I, Togneri R, Bennamoun M (2010) Linear regression for face recognition. IEEE Trans Pattern Anal Mach Intell 32(11):2106–2112
Google Scholar
Nie F, Huang H, Cai X, Ding C (2010) Efficient and robust feature selection via joint L2, 1-norms minimization. Twenty-Fourth Annual Conference on Advances in Neural Information Processing Systems (NIPS 2010) 23:1813–1821
Google Scholar
Ren C-X, Dai D-Q, Yan H (2012) Robust classification using L2,1-norm based regression model. Pattern Recogn 45:2708–2718
MATH Google Scholar
Scholkopf B, Mika S, Burges CJC, Knirsch P et al (1999) Input space versus feature space in kernel-based methods. IEEE Trans Neural Netw 10(5):1000–1017
Google Scholar
Shao C, Song X, Feng Z-H, Wu X-J et al (2017) Dynamic dictionary optimization for sparse-representation-based face classification using local difference images. Inf Sci 393:1–14
Google Scholar
Shekhar S, Patel VM, Nasrabadi NM, Chellappa R (2014) Joint sparse representation for robust multimodal biometrics recognition. IEEE Trans Pattern Anal Mach Intell 36(1):113–126
Google Scholar
Shrivastava A, Patel V, Chellappa R (2014) Multiple kernel learning for sparse representation-based classification. IEEE Trans Image Process 23(7):3013–3024
MathSciNet MATH Google Scholar
Song B, Li P, Li J, Plaza A (2016)One-class classification of remote sensing images using kernel sparse representation. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 9(4):1613–1623
Google Scholar
Song X, Feng F, Liu J, Li Z et al (2017) "NeuroStylist: neural compatibility modeling for clothing matching," presented at the proceedings of the 25th ACM international conference on multimedia. Mountain View, California
Google Scholar
Wagner A, Wright J, Ganesh A, Zhou Z et al (2012) Toward a practical face recognition system: robust alignment and illumination by sparse representation. IEEE Trans Pattern Anal Mach Intell 34(2):372–386
Google Scholar
Wang S-J, Yang J, Sun M-F, Peng X-J et al (2012) Sparse tensor discriminant color space for face verification. IEEE Transactions on Neural Networks and Learning Systems 23(6):876–888
Google Scholar
Wang J, Lu C, Wang M, Li P et al (2014) Robust face recognition via adaptive sparse representation. IEEE Transactions on Cybernetics 44(12):2368–2378
Google Scholar
Wang L, Yan H, Lv K, Pan C (2014) Visual tracking via kernel sparse representation with multi-kernel fusion. IEEE Transactions on Circuits and Systems for Video Technology 24(7):1132–1141
Google Scholar
Wright J, Yang AY, Ganesh A, Sastry SS et al (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227
Google Scholar
Wright J, Ma Y, Mairal J, Sapiro G et al (2010) Sparse representation for computer vision and pattern recognition. Proc IEEE 98(6):1031–1044
Google Scholar
Xu Y, Zhang D, Yang J, Yang J-Y(2011) A two-phase test sample sparse representation method for use with face recognition. IEEE Transactions on Circuits and Systems for Video Technology 21(9):1255–1262
MathSciNet Google Scholar
Xu Z, Chang X, Xu F, Zhang H (2012) Regularization: a thresholding representation theory and a fast solver. IEEE Transactions on Neural Networks and Learning Systems 23(7):1013–1027
Google Scholar
Xu Y, Fang X, Li X, Yang J et al (2014) Data uncertainty in face recognition. IEEE Transactions on Cybernetics 44(10):1950–1961
Google Scholar
Xu Y, Zhang Z, Lu G, Yang J (2016) Approximately symmetrical face images for image preprocessing in face recognition and sparse representation based classification. Pattern Recogn 54:68–82
Google Scholar
Yang M, Zhang L (2010) "Gabor feature based sparse representation for face recognition with Gabor occlusion dictionary," presented at the ECCV 2010. Crete, Greece
Google Scholar
Yang J, Zhang D, Yang J-y, Niu B (2007) Globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics. IEEE Trans Pattern Anal Mach Intell 29(4):650–664
Google Scholar
Yang J, Wright J, Huang TS, Ma Y (2010) Image super-resolution via sparse representation. IEEE Trans Image Process 19(11):2861–2873
MathSciNet MATH Google Scholar
Yang J, Zhang L, Xu Y, Yang J-y(2012) Beyond sparsity: the role of L1-optimizer in pattern classification. Pattern Recogn 45(3):1104–1118
MATH Google Scholar
Yang J, Chu D, Zhang L, Xu Y (2013) Sparse representation classifier steered discriminative projection with applications to face recognition. IEEE Transactions on Neural Networks and Learning Systems 24(7):1023–1035
Google Scholar
Yang M, Zhang L, Feng X, Zhang D (2014) Sparse representation based fisher discrimination dictionary learning for image classification. Int J Comput Vis 109(3):209–232
MathSciNet MATH Google Scholar
Yao J, Wang S, Zhu X, Huang J (2016) "Imaging Biomarker Discovery for Lung Cancer Survival Prediction," presented at the International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2016, Athens, Greece
Yin J, Liu Z, Jin Z, Yang W (2012) Kernel sparse representation based classification. Neurocomputing 77(1):120–128
Google Scholar
Zhang L, Yang M, Feng X (2011) Sparse Representation or Collaborative Representation: Which Helps Face Recognition? ," presented at the ICCV 2011, Barcelona, Spain.
Zhang L, Zhou W-D, Chang P-C, Liu J et al (2012) Kernel sparse representation-based classifier. IEEE Trans Signal Process 60(4):1684–1695
MathSciNet MATH Google Scholar
Zhang G, Sun H, Xia G, Sun Q (2016) Multiple kernel sparse representation-based orthogonal discriminative projection and its cost-sensitive extension. IEEE Trans Image Process 25(9):4271–4285
MathSciNet MATH Google Scholar
Zheng M, Bu J, Chen C, Wang C et al (2011) Graph regularized sparse coding for image representation. IEEE Trans Image Process 20(5):1327–1336
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This article is partly supported by Natural Science Foundation of China (NSFC) under grants Nos. 61991401, 61673097, 61490704 and Jiangxi Provincial Natural Science Foundation of China under Grant 20192ACBL20010, as well as Science and Technology Foundation of Jiangxi Transportation Department of China (2015D0066).

Author information

Authors and Affiliations

School of Science, East China Jiaotong University, Nanchang, China
Zizhu Fan & Chao Wei

Authors

Zizhu Fan
View author publications
You can also search for this author in PubMed Google Scholar
Chao Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zizhu Fan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, Z., Wei, C. Fast kernel sparse representation based classification for Undersampling problem in face recognition. Multimed Tools Appl 79, 7319–7337 (2020). https://doi.org/10.1007/s11042-019-08211-x

Download citation

Received: 10 October 2018
Revised: 16 July 2019
Accepted: 12 September 2019
Published: 21 December 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s11042-019-08211-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Fast kernel sparse representation based classification for Undersampling problem in face recognition

Abstract

Similar content being viewed by others

Discriminative face recognition via kernel sparse representation

Kernel L1-Minimization: Application to Kernel Sparse Representation Based Classification

Kernel Based Weighted Group Sparse Representation Classifier

1 Introduction

2 Review of SRC and ESRC