Robust sparse manifold discriminant analysis

Wang, Jingjing; Liu, Zhonghua; Zhang, Kaibing; Wu, Qingtao; Zhang, Mingchuan

doi:10.1007/s11042-022-12708-3

Robust sparse manifold discriminant analysis

Published: 12 March 2022

Volume 81, pages 20781–20796, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Robust sparse manifold discriminant analysis

Download PDF

Jingjing Wang¹,
Zhonghua Liu¹,
Kaibing Zhang²,
Qingtao Wu¹ &
…
Mingchuan Zhang¹

386 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Classical linear discriminant analysis (LDA) has been applied to machine learning and pattern recognition successfully, and many variants based on LDA are proposed. However, the traditional LDA has several disadvantages as follows: Firstly, since the features selected by feature selection have good interpretability, LDA has poor performance in feature selection. Secondly, there are many redundant features or noisy data in the original data, but LDA has poor robustness to noisy data and outliers. Lastly, LDA only utilizes the global discriminant information, without consideration for the local discriminant structure. In order to overcome the above problems, we present a robust sparse manifold discriminant analysis (RSMDA) method. In RSMDA, by introducing the L_2,1 norm, the most discriminant features can be selected for discriminant analysis. Meanwhile, the local manifold structure is used to capture the local discriminant information of the original data. Due to the introduction of L_2,1 constraints and local discriminant information, the proposed method has excellent robustness to noisy data and has the potential to perform better than other methods. A large number of experiments on different data sets have proved the good effectiveness of RSMDA.

Sparse L1-norm-based linear discriminant analysis

Article 13 September 2017

A supervised discriminant data representation: application to pattern classification

Article Open access 20 May 2022

Linear Discriminant Analysis with Maximum Correntropy Criterion

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Feature selection and extraction, which have received extensive attention in recent years, play an important role in pattern classification [10, 14, 41]. However an image raw data contains a large number of redundant features and noise, which make difficult for image recognition and image analysis. [18, 19, 27]. In this case, for classification tasks, how to select and extract the different categories of the most significant features in the whole exercise is among the most difficult. It has proven that feature selection and extraction are effective tools in the field of machine learning and pattern classification. They can reduce the complexity, increase the efficiency and enhance the classification performance [20, 25, 26].

Whether it is feature selection or feature extraction, to a certain extent, they both are subspace learning methods [2, 11, 13]. They have common goal in finding a low-dimensional representation of the original high-dimensional data in a new learning space [28, 30, 33].

Various methods of feature extraction have been proposed in the past few decades. In this branch, principal component analysis (PCA) [38] is one of the most famous method, its main idea is to try to learn a projection that can preserve the main energy of the original data. Local preserving projection (LPP) [7], sparse preserving projection (SPP) [24] and neighborhood preserving embedding (NPE) [8] which they learn their projections are also feature extraction methods, these methods consider the local manifold geometry of the original data and try to preserve the local information in the projection space [4, 16, 29]. Subsequently, classifiers, such as K Nearest Neighbor (KNN) [42] and Support Vector Machine (SVM) [3], are usually used for classification. Although the above methods don’t use the label information of the data, have different advantages, the classification performance of these algorithms are not good enough to some extent.

In fact, we hope that dimensionality reduction can be achieved for data with category labels (supervised), and each category can be better distinguished after dimensionality reduction. At this time, another classic feature extraction algorithm-linear discriminant analysis (LDA) appeared [43]. LDA is a classic linear learning method. The idea of linear discrimination is simple but very effective. LDA can greatly expand the distance between classes and reduce the distance within classes. And through the use of label information to learn discriminant projection, which improves the classification accuracy [9, 17, 40].To enhance performance and efficiency, many variants based on LDA are proposed [5, 12, 15] such as orthogonal LDA (OLDA) [31], unrelevant LDA (ULDA) [32]. OLDA and ULDA transform the image into a vector for learning projection, which aim to solve the problem of small LDA sample size. However, these methods, which based on LDA in the calculation of the L₂ norm scatter matrix, may cause the error seriously, and these methods are sensitive to outliers [1, 34, 44]. Later, sparse linear discriminant analysis (SLDA) [23] and sparse uncorrelated linear discriminant analysis (SULDA) [39] are proposed to learn sparse discriminant subspace for feature extraction. Robust Adaptive Linear Discriminant Analysis (RALDA) [6] method achieved an appropriate latent subspace for data representation where L_2,1 norm is adopted in the formulations of loss function,which the regularization term can reduce the impact of outliers and noise and predict the select discriminative features. Later, Ning et.al. proposed a method named BULDP: Biomimetic Uncorrelated Locality Discriminant Projection for Feature Extraction in Face Recognition, it is based on unsupervised discriminating projection and two human bionic where in: homologous and isomeric principle of continuity principle of similarity [21]. Later, Ning et.al. proposed Real-time 3D Face Alignment Using an Encoder-Decoder Network with an Efficient Deconvolution Layer [22]. Zhang et.al. proposed a new iterative reweight-based log-sum constraint channel estimation scheme. it used the structure sparsity of the mmWave channels by formulating the channel estimation problem as an objective optimization problem. [35].Later, Zhang et.al. proposed Block-Sparsity Log-Sum-Induced Adaptive Filter for Cluster Sparse System Identification The main idea of the proposed scheme is to add a new block-sparsity induced term into the cost function of the LMS algorithm [36]. In addition, he proposed Block-Sparsity Log-Sum-Induced Adaptive Filter for Cluster Sparse System Identification [37] . The main idea is to add a sparsity lp norm penalty cost function of the LMS algorithm.

However, LDA has some obvious disadvantages. Firstly, each new feature combined with others, which the most of projection coefficients are not zero. The learned projection matrix cannot explain the features well. This also shows that LDA cannot select the most useful function from redundant data. Secondly, LDA selects k feature vectors as the projection of feature extraction, and the selected k feature vectors will have a one-to-one correspondence with the first k smallest feature values, and the number of k is related to the data. For dimensionality reduction, this makes the choice of the classification accuracy of linear discriminant analysis sensitive. Thirdly, LDA, and many methods based on linear discriminant analysis are sensitive to noise.

In order to overcome this problem, a new robust sparse manifold discriminant analysis (RSMDA) algorithm is proposed for dimension reduction. Especially, the proposed RSMDA method uses L_2,1 norm based sparse constraint to choose the important feature. At the same time, the manifold based local discrimination information is added on the basis of the original linear discrimination analysis. The innovations of this article mainly include the following aspects.

(1)
The most useful and discriminative features can be selected by introducing the L_2,1 norm.
(2)
Compared with other methods based on linear discriminant analysis, the proposed method is more robust to noise.
(3)
By introducing the local discriminative information, the sparse projection can use the local identification information to enhance the discrimination of projection.

The remaining chapters of this article are specifically arranged as follows. Section 2 mainly reviews some related work. The third part describes the robust discriminant analysis of sparse manifolds in detail and gives the optimal iterative scheme of RSMDA. In Section 4, There are a large number experiments on different libraries to prove the good performance. Section 5 gives the corresponding conclusion.

2 Related work

In this section, we will briefly review the research work, which mainly includes LDA and manifold based local discriminant information learning. For convenience, Table 1 introduce some notations used through the paper.

Table 1 Significant Notations Annotated in This Paper

Full size table

2.1 Linear discriminant analysis (LDA)

Suppose there are c pattern classes, n_i represents the number of samples of the i^th class, $ \mathrm{n}={\sum}_{i=1}^c $ is the total number of all samples, column vector $ {x}_j^i\in {R}_m $ denotes the j^th sample of the i^th class. LDA tries to find a projection matrix, which makes the samples in the same category as close as possible, and makes the samples in different categories are as far away as possible. LDA uses the following fisher to obtain this projection vector.

$$ \mathrm{a}=\arg \kern0.33em \underset{a}{\max}\frac{a^T{S}_ba}{a^T{S}_wa} $$

(1)

where S_b is the inter-class scatter matrix, and S_w is the intra-class scatter matrix. S_b and S_w are calculated as following.

$$ {S}_b=\frac{1}{n}{\sum}_{i=1}^c{n}_i\left({u}_i-u\right){\left({u}_i-u\right)}^T $$

(2)

$$ {S}_w=\frac{1}{n}{\sum}_{i=1}^c{n}_i{\sum}_{j=1}^{n_i}\left({x}_j^i-{u}_i\right){\left({x}_j^i-{u}_i\right)}^T $$

(3)

where $ {u}_i=\frac{1}{n_i}{\sum}_{j=1}^{n_i}{x}_j^i $ is the mean feature of samples of the ith class, $ {u}_i=\frac{1}{n_i}{\sum}_{i=1}^c{\sum}_{j=1}^{n_i}{x}_j^i $ is the mean feature of all samples. Generally, problem (1) is equivalent to the following optimization problem .

$$ {\displaystyle \begin{array}{c}a=\mathrm{argmin}{a}^T\left({S}_w-\mu {S}_b\right)a\\ {}s.t.\kern0.33em {a}^Ta=I\end{array}} $$

(4)

where μ is a small positive constant.

By solving Eq.(4), we can observe that the optimal projection vector a, it is the eigenvector corresponding to the minimum eigenvalue of S_wa = μS_ba. Generally, a single projection vector is not enough to distinguish multiple classes. In real world applications, we usually select a set of projection vectors which satisfy the optimal Fisher criterion $ A=\arg \kern0.33em \underset{A^TA=1}{\min }T\mathrm{r}\left({A}^T\left({S}_{\mathrm{w}}-\mu {S}_b\right)A\right) $ for multiclass classification. The projection matrix X is selected as a set of eigenvectors corresponding to the first k smallest eigenvalues of S_wA = λS_bA. Let A = [a₁, a₂, .., a_k] ∈ R^m × k be the set of the selected k eigenvectors, we can obtain discriminative feature vector $ {y}_j^i\in {R}^k $ of each sample by $ {y}_j^i={A}^T{x}_j^i $.

2.2 Local manifold learning

The nearest neighbor graph G is first constructed and its weight matrix is defined as follows:

$$ {W}_{ij}=\left\{\begin{array}{c}1,\kern0.66em if\kern0.33em {x}_i\in N\left({x}_j\right)\kern0.33em or\kern0.33em {x}_j\in N\left({x}_i\right)\\ {}0,\kern0.66em otherwise\end{array}\right. $$

(5)

where N(∙) is a set of k nearest neighbors, the nearest neighbor graph G is divided into two graphs, i.e., within-class graph G_w and between-class graph G_b for extracting local discriminant information of samples. For each sample x_i, its k nearest neighbors are split into two subsets, N_w(x_i) and N_b(x_i). N_w(x_i) contains the neighbors sharing the same label with x_i, whereas N_b(x_i) contains the neighbors that have different labels. Let W^w and W^b be the weight matrices of G_w and G_b, respectively. Let W^w and W^b be the weight matrices of G_w and G_b, respectively. In each of the graphs defined above, if two samples have a nonzero weight, then they are referred to as connected samples. It is clear to see that W = W^w + W^b. In this way, the conversion matrix A can map samples from the original sample space to the label space.The result is that the connected samples of G_w stay as close together as possible, while the connected samples of G_b stay as far as possible. Given a map as f_i = x_iA (i = 1, …, n), it should satisfy the following objective functions:

$$ \min \sum \limits_{ij}{\left\Vert {f}_i-{f}_j\right\Vert}^2{W}_{ij}^w $$

(6)

$$ \max \sum \limits_{ij}{\left\Vert {f}_i-{f}_j\right\Vert}^2{W}_{ij}^b $$

(7)

In Eq. (6), if x_i and x_j are close and have the same label, then f_i and f_j should be close as well. In Eq. (7), if x_i and x_j are close but have different labels, then f_i and f_j should be far from each other. (6) and (7) can be converted into

$$ \kern0.99em \min \sum \limits_{ij}{\left\Vert {f}_i-{f}_j\right\Vert}^2{W}_{ij}^w=\min Tr\left({A}^T{X}^T{L}_w XA\right) $$

(8)

$$ \kern0.99em \max \sum \limits_{ij}{\left\Vert {f}_i-{f}_j\right\Vert}^2{W}_{ij}^b=\max Tr\left({A}^T{X}^T{L}_b XA\right) $$

(9)

where L_w and L_b are the Laplacian matrix of G_w and G_b. They are defined as L_w = D^w − W^w, L_b = D^b − W^b. D^w and D^b are diagonal matrices, which their diagonal entries are $ {D}_{ii}^w={\sum}_j{W}_{ij}^w $ and $ {D}_{ii}^b={\sum}_j{W}_{ij}^b $.

Finally, the Eqs. (8) and (9) can be rewritten as

$$ {\displaystyle \begin{array}{c}\underset{\mathrm{A}}{\min}\left( Tr\left({A}^T{X}^T{L}_w XA\right)- Tr\left({A}^T{X}^T{L}_b XA\right)\right)\\ {}=\underset{\mathrm{A}}{\min } Tr\left({A}^T{X}^T\left( Lw-{L}_b\right) XA\right)\end{array}} $$

(10)

3 Robust sparse manifold discriminant analysis (RSMDA)

In this section, the motivation of our RSMDA is firstly introduced. Then the optimization solution of RSMDA is given.

3.1 The motivation of RSMDA

The main object of LDA is projection. The advantage of this algorithm is that the projection, it shortens the projection distance of the same type of sample, and it can also increase the distance of different types of samples. But the LDA also has its design defects. Most of the projection coefficients are non-zero, and the new feature samples are linear combinations of all sample features, which leads to the lack of good explanatoryness of the projective matrix to the characteristics, which is reflected in the following aspects. First of all, LDA does not have a good explanation for matrix features, but it cannot select the most suitable function in a large amount of redundant data. Second, when LDA selects k eigenvector corresponding to the first k minimum eigenvalue as a feature extraction projection, it is greatly affected by the data, and the LDA data classification varies greatly. This leads to great differences in LDA classification accuracy of different sizes. Third, most LDA-based implementation methods cannot be ignored by external environmental impacts, such as many methods that are vulnerable to noise. In this paper, we propose a robust sparse manifold discriminant analysis (RSMDA) to solve the above problems.

At the same time, in practical applications, the acquired data is generally high-dimensional. Data contains a large amount of redundant information, and some noise can corrupt data or images. Therefore, the choice of those essential features from the original complex data discriminant analysis in the whole learning process is vital, because it can effectively reduce the negative impact caused by redundant features. Applying a sparse norm constraint projection allows the model to perform feature selection, such as the L_2,1 norm. The L_2,1 norm has good line sparsity compared with the L₁ norm, precisely because of its property, which makes it easier for the projection to interpret elements. Sparse projection discrimination information can grasp the local identification of the projection to extend. At the same time, we are concerned about the situation of random noise. We use this term to compensate for the noise sparse so that can reduce the negative impact to some extent. Inspired by this motive, we recommend learning more powerful by using the following constraints discrimination subspace

$$ {\displaystyle \begin{array}{c}\underset{P,Q,E}{\min }T\mathrm{r}\left({Q}^T\left({S}_w-\mu {S}_b\right)Q\right)\kern0.33em +{\lambda}_1{\left\Vert Q\right\Vert}_{2,1}+{\lambda}_2{\left\Vert E\right\Vert}_1\\ {}s.t.X=P{Q}^TX+E,{P}^TP=I\end{array}} $$

(11)

where Q ∈ R^m × d(d < m) is divided into scattering matrices between different classes and within the same class. λ₁ and λ₂ are trade-off parameters, μ is a small positive constant used to balance the importance of S_b and S_w. LDA realizes maximizing the interclass scattering matrix S_w and minimizing the internal class scattering matrix S_b according to the balanced optimization projection matrix. E represents errors and it is used to simulate random noise. ‖⋅‖₁ is the L₁ norm. In some cases, X = PQ^TX and P^TP = I can be regarded as variants of PCA, which has the advantage that the original data can be recovered well. P ∈ R^m × d is an orthogonal reconstruction matrix. By reconfiguring the relationship between the sample and the original sample that is considering conversion, on a reduced dimension, the transformed data can be retained as much as possible the main energy of the original data. In this way, RSMDA not only learn discriminant subspace, but also learn optimization framework with the minimum loss of information, it is possible to perform better.

The research shows that the discriminant analysis method based on global structure information ignores the local information of the image. On the basis of manifold technology, the original image is reduced and dimensionally processed, the local data manifold structure in the nonlinear sub-manifold can be better maintained. By introducing the manifold based local discriminative information, the objective function of the proposed RSMDA can be rewritten as follows.

$$ {\displaystyle \begin{array}{c}\underset{P,Q,E}{\min }T\mathrm{r}\left({Q}^T\left({S}_w-\mu {\mathrm{S}}_b\right)Q\right)+T\mathrm{r}\left({Q}^T{X}^T\left({L}_w-{L}_b\right) XQ\right)\\ {}+{\lambda}_1{\left\Vert Q\right\Vert}_{2,1}+{\lambda}_2{\left\Vert E\right\Vert}_1\\ {}s.t.X=P{Q}^TX+E,\kern0.43em {P}^TP=I\end{array}} $$

(12)

3.2 Optimization of RSMDA

In this section, we propose an iterative algorithm to update the rules to solve the optimal solution in eq. (12). Since the objective function is non-convex and contains four different variables. It’s difficult for us to get the global optimal solution. Therefore, we can obtain the local optimal solution through continuous iteration. First of all, we use the Lagrange function to convert the problem (12) to the following form.

$$ {\displaystyle \begin{array}{c}L\left(P,Q,E,Y\right)= Tr\left({Q}^T\left( Sw-\mu {\mathrm{S}}_b\right)Q\right)+ Tr\left({Q}^T{X}^T\left({L}_w-{L}_b\right) QX\right)\\ {}+{\lambda}_1{\left\Vert Q\right\Vert}_{2,1}\kern0.33em +{\lambda}_2{\left\Vert E\right\Vert}_1+\left\langle Y,X-{Q}^TX-E\right\rangle \\ {}+\frac{\beta }{2}{\left\Vert X-P{Q}^TX-E\right\Vert}_F^2\\ {}= Tr\left({Q}^T\left({S}_w-\mu {\mathrm{S}}_b\right)Q\right)+ Tr\left({Q}^T{X}^T\left({L}_w-{L}_b\right) QX\right)\\ {}+{\lambda}_1{\left\Vert Q\right\Vert}_{2,1}\kern0.66em +{\lambda}_2{\left\Vert E\right\Vert}_1-\frac{1}{2\beta }{\left\Vert Y\right\Vert}_F^2\\ {}+\frac{\beta }{2}{\left\Vert X-P{Q}^TX-E+\frac{Y}{\beta}\right\Vert}_F^2\\ {}\kern0.99em \end{array}} $$

(13)

where β is a penalty parameter, Y is the Lagrangian multiplier.Then P, Q, E can be alternately solved by minimizing the Lagrangian function L with other variables fixed. The solution scheme is as follows.

Fix other variables to update Q

we fix P, E and update Q by solving the following problem

$$ {\displaystyle \begin{array}{c}L(Q)= Tr\left({Q}^T\left({S}_w-\mu {S}_b\right)Q\right)+ Tr\left({Q}^T{X}^T\left({L}_w-{L}_b\right) QX\right)\\ {}+{\lambda}_1{\left\Vert Q\right\Vert}_{2,1}+\frac{\beta }{2}{\left\Vert X-{Q}^TX-E+\frac{Y}{\beta}\right\Vert}_F^2\end{array}} $$

(14)

Define $ X-E+\frac{Y}{\beta }=M $, Q can be calculated by the derivative of L(Q) with respect to Q

$$ \frac{\partial L(Q)}{\partial Q}=2\left( Sw-\mu S\mathrm{b}\right)+{\lambda}_1 DQ+\beta \left(X{X}^TQ-X{M}^T\right) $$

(15)

where D is defined as q_i the ith row of Q and $ Q=\left[\begin{array}{c}{q}_1\\ {}\kern0.33em \vdots \\ {}{q}_m\end{array}\right] $. Let ∂L(Q)/∂Q = 0, then we obtain

$$ Q={\left(2\left({S}_w-\mu {S}_b\right)+{\lambda}_1D+\beta X{X}^T\right)}^{-1}\left(\beta X{M}^T\right) $$

(16)

Fix other variables to update P

We fix Q E and update P by minimizing the following problem

$$ \underset{P^TP=I}{\min }{\left\Vert X-P{Q}^TX-E+\frac{Y}{\beta}\right\Vert}_F^2 $$

(17)

Let $ X-E+\frac{Y}{\beta }=M $. The optimization (17) is converted to

$$ {\displaystyle \begin{array}{c}\kern0.99em \underset{P^TP=I}{\min }{\left\Vert M-P{Q}^TX\right\Vert}_F^2\\ {}=\underset{P^TP=I}{\min } Tr\left({M}^TM-2{M}^TP{Q}^TX\right)\\ {}=\underset{P^TP=I}{\min } Tr\left({M}^TP{Q}^TX\right)\\ {}=\underset{P^TP=I}{\min } Tr\left({P}^TM{X}^TQ\right)\end{array}} $$

(18)

Problem (18) is an Orthogonal Procrustes problem and it can be simply solved. Suppose SVD(MX^TQ) = USV^T, then P is obtained as P = UV^T, where SVD represents the singular value decomposition operation.

Fix other variables to update E

We fix P, Q and update E by solving the following problem.

$$ \underset{E}{\min }{\lambda}_2{\left\Vert E\right\Vert}_1\kern0.33em +\frac{\beta }{2}{\left\Vert X-P{Q}^TX-E+\frac{Y}{\beta}\right\Vert}_F^2 $$

(19)

If we define $ \varepsilon =\frac{\lambda_2}{\beta } $ and $ {E}_0=X-{Q}^TX+\frac{Y}{\beta } $, in according to shrinkage calculator, problem (19) has the closed solution as follows.

$$ E= shrink\left({E}_0,e\right) $$

(20)

where shrink means the shrinkage calculator.

Fix other variables to update Lagrange multiplier

Y and β are respectively updated by using the following formulas

$$ Y=Y+\beta \left(X-P{Q}^TX-E\right) $$

(21)

$$ \beta =\min \left(\rho \beta, {\beta}_{\mathrm{max}}\right) $$

(22)

where ρ and β_max are the constant.

3.3 Computational complexity and convergence analysis of RSMDA

We analyze the computational complexity of RSMDA. the major computational cost is the matrix inverse operation. For a m × m matrix, the computational complexity of inverse operation is O(m³). The whole computational complexity of the proposed method is O(τ(m²n + m³ + 2 max(m²; mn)d + d³)), where τ is the iteration number. For simplicity, we suppose that m ≫ n, thus the computational complexity of the proposed method is O(τ(m²n + m³ + 2m²d + d³)).

In this section, two databases are selected to analyze the convergence of the proposed RSMDA. The optimization solution process of RSMDA is realized by iteratively updating three variables. It can be seen from the experimental results in Fig. 1 that the RSMDA algorithm is convergent and converges to the local optimal solution.

4 Experiments

Some experiments have been carried out in this section to prove the good performance of the proposed RSMDA algorithm. This includes classification accuracy, convergence to global optimality, and the effectiveness of high-dimensional image maintenance. In this section, five public image databases are selected to evaluate the effectiveness of the proposed method, KNN and some supervised learning methods, including SVM, LDA, SLDA, OLDA, ULDA, SULDAare chose to compare with the proposed method. At the same time, the accuracy of all test results (AC) is used as a unified evaluation standard. The standard is based on the percentage and actual results of the correct classification results.

4.1 Experiments on the Yale B face database

There are 2432 face images from 38 subjects in the Expanded Yale B Face database. 64 images of faces under different lighting conditions were provided for each subject, and we manually converted each image to a 32 × 32 Gy scale image. One experiment, as show in Fig. 2. which 10, 15, 20 and 25 samples were randomly selected as training samples and the rest as test samples was repeated 10 times, and the classification accuracy of each algorithm was show in Table 2.

Table 2 Classification accuracy (%) of different methods on the Yale B database

Full size table

Table 2 shows the experimental results obtained by different methods on the extended Yale B database. It can be seen that the proposed RSMDA method has better performance than other supervised learning methods, especially KNN, SVM and LDA.

4.2 Experiments on the CMU PIE face database

There are 41,368 face images from 38 subjects in the CMU PIE face database. These facial images were obtained from 68 subjects in different poses and different lighting conditions. In this experiment, we used a subset of the CMU PIE face database, which contains 11,554 images from 68 subjects. We manually convert each image into a 32 × 32 Gy scale image. Figure 3 shows the sample image of one of them. In this experiment, the training set was the first 10, 15, 20, and 25 images of each person, and the test set was the remaining images. Repeat the algorithm 10 times. The classification accuracy of each algorithm is shown in Table 3.

Table 3 Classification accuracy (%) of different methods on the CMU PIE database

Full size table

It can be seen from Table 3 that as the number of training samples increases, the recognition rates of KNN, SVM, LDA, SLDA, OLDA, ULDA, SULDA and the proposed RSMDA are steadily increasing. Still, no matter how many samples are, the proposed methods perform better than other contrast algorithms. This proves that RSMDA can capture as much discriminative information as possible, which is better than these comparison methods to some extent.

4.3 Experiments on the AR face database

The AR face database contains color face images of 120 people, and the total number of face images exceeds 4000. Among them, 120 subjects were photo graphed twice with different facial expressions, light conditions and shade, with a 14-day interval, and each person produced 26 images. In our experiment, out of 26 facial images of 120 people, seven were selected from each stage, or 14 face images per person. We manually converted each image to 50 by 40 pixels. Figure 4. shows the sample image of one of them. In this experiment, the training set is the first 4, 6, 8, and 12 images of each person, and the test set is the remaining images. Repeat the algorithm 10 times. The classification accuracy of each algorithm is shown in Table 4.

Table 4 Classification accuracy (%) of different methods on the AR database

Full size table

Table 4 shows the experimental results of different methods on the AR face database. It is clear from the Table 4 that when the number of training samples increases from 6 to 8, the classification accuracy of ULDA and SULDA is decreasing, while the classification accuracy of RSMDA is steadily improving, and no matter how the number of training samples changes, this method achieves the best performance in many methods.

4.4 Experiments on ORL database

The ORL database includes a total of 400 facial images, collected from 40 people, and each person provides 10 facial images. And the images were taken at different times and under a different light. The image content includes different facial expressions and facial details. Figure 5. below shows a sample image of one of them. In this experiment, the training set is the first 3, 4, 5, and 6 images of each person, and the test set is the remaining images. Repeat the algorithm 10 times. The classification accuracy of each algorithm is shown in Table 5.

Table 5 Classification accuracy (%) of different methods on the ORL database

Full size table

We can clearly see from Table 5 that the proposed RSMDA is superior to KNN, SVM, LDA, SLDA, OLDA,ULDA and SULDA.

4.5 Experiments on Georgia Tech database

The Georgia tech face database contains photos of 50 people taken during two or three sessions. The faces in these pictures may be front and tilted, or they may be front or tilted. These images include different expression, illumination and proportion. Each image is manually cropped to 60 by 50 pixels. Be converted to grayscale images. Figure 6 shows the sample image of one of them. In this experiment, the training set was the first 5, to 8 images of each person, and the test set is the remaining images. Repeat the algorithm 10 times. The classification accuracy of each algorithm is shown in Table 6.

Table 6 Classification accuracy (%) of different methods on the Georgia Tech database

Full size table

Table 6 shows the RSMDA method has better performance than KNN, LDA, OLDA and ULDA, and the recognition rate is higher. Compared with SULDA, the classification accuracy of RSMDA steadily increases with the increase in the number of training samples.This also shows that RSMDA is more stable than SULDA to a certain extent.

4.6 Parameters selection

There are two regularization parameters, which can affect the performance of RSMDA. In the following, we examine the effects of these two parameters λ₁ and λ₂ on the proposed algorithm by examining the changes in the recognition performance of RSMDA under different parameter values. We choose two databases as the test set, they are CMU PIE and Yale B databases. The experimental results of the algorithm are shown in Fig. 7. We take (10⁻⁵,10⁻⁴, 10⁻³,10⁻²,10⁻¹,1, 10¹, 10², 10³, 10⁴, 10⁵) as the value range of the two regularization parameters λ₁ and λ₂, then execute the proposed method RSMDA. As can be seen from the Fig. 7, the proposed method performs best when the value λ₁ and λ₂ are close to 0.0001. Differences in the classification results show that the two parameters for the identification and classification of learning and performance projection algorithm has an important influence.

5 Conclusion

In this paper, we propose a novel supervised feature extraction method termed robust sparse manifold discriminant analysis (RSMDA), in which the global discriminative information, local manifold discriminative information and sparse representation are integrated into a framework. By using the L_2,1 sparse norm to limit the discriminative projection matrix, the proposed method can perform feature selection and feature extraction at the same time, and these features are more suitable for classification tasks because they have the most discriminative information. This reconstruction constraint minimizes the loss of difference information, thereby improving the accuracy of classification. In addition, the local identification information is introduced, which further enhances the discrimination of the extracted projection matrix. The experimental results on ~~six~~ five databases all show that this method performs better than other competing methods.

References

Andekah Z, Naderan M, Akbarizadeh G (2017) Semi-supervised Hyperspectral image classification using spatial-spectral features and superpixel-based sparse codes, in: 25th Iranian Conference on Electrical Engineering (ICEE), pp 2229–2234
Belkin M, Niyogi P (2019) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15:1373–1396
Article Google Scholar
Chang C, Lin C(2011) Libsvm: a library for support vector machines. ACM transactions on intelligent systems and technology(TIST). Pp 1-27.
Zhu F, Gao J, Yang J, Ye N (2022) Neighborhood linear discriminant analysis. Pattern Recognition 123:108422
Gao G, Yu Y, Yang M, et al. (2020). Cross-resolution face recognition with pose variations via multilayer locality-constrained structural orthogonal procrustes regression. Information science, pp 19-36.
Guo J, Gao J, Hu Y (2020) Robust adaptive linear discriminant analysis with bidirectional reconstruction constraint. ACM, Transactions on Knowledge Discovery from Data 14(6):5
He X, Niyogi P (2004) Locality preserving projections. Neural Inf Process Syst:153–160
He X, Cai D, Yan S, et al(2005) Neighborhood preserving embedding. IEEE international conference on computer vision, pp 1208-1213.
Lai Z, Mo D, Wong W (2018) Robust discriminant regression for feature extraction. IEEE Trans Cybernet 48:2472–2484
Article Google Scholar
Li Y, Tian X, Liu T (2018) On better exploring and exploiting task relationships in multitask learning: joint model and feature learning. IEEE TransNeural Netw Learn 29:1975–1985
Article MathSciNet Google Scholar
Liu T, Huang G (2018) An adaptive graph learning method based on dual data representations for clustering. Parttern Recognition. pp 126–139 .
Liu Z, Liu G, Pu J(2017) Noisy label based discriminative least squares regression and its kernel extension for object identification, KSII trans Internet Inf Syst. pp 2523-2538.
Liu Z, Liu G, Pu J, et al.(2018) Orthogonal sparse linear discriminant analysis. International journal of systems science. Pp 848-858.
Liu Z, Shi K, Zhang K, Ou W, Wang L (2020) Discriminative sparse embedding based on adaptive graph for dimension reduction. Eng Appl Artif Intell 94:103758
Article Google Scholar
Liu J, Song C, Zhao J,(2020) Manifold-preserving sparse graph-based ensemble FDA for industrial label-noise fault classification. IEEE Trans Instrum Meas. pp 2621-2634
Lu X, Wang Y (2017) Graph-regularized low-rank representation for destriping of hyperspectral images. IEEE Trans Geosci Remote Sens 51(7):4009–4018
Article Google Scholar
Lu Y, Lai Z, Xu Y, Li X, Zhang D, Yuan C (2016) Low-rank preserving projections. IEEE Trans Cybernet 46:1900–1912
Article Google Scholar
Luo M, Nie F , Chang X(2018) Adaptive unsupervised feature selection with structure regularization, IEEE trans. Neural Netw. Learn. Pp 944-956.
Mishra G (2020) Constrained L-1-optimal sparse representation technique for face recognition, Optics Laser Technol. pp 1975–1985 .
Modava M, Akbarizadeh G (2019) Coastline extraction from SAR images using spatial fuzzy clustering and the active contour method. Int J Remote Sens 38:355–370
Article Google Scholar
Ning X, Li W,Tang B, He H (2018) BULDP: biomimetic uncorrelated locality discriminant projection for feature extraction in face recognition , IEEE Trans Image Process
Ning X, Li W, Tang B, He H (2020) Real-time 3D face alignment using an encoder-decoder network with an efficient deconvolution layer. Letters, IEEE Signal Processing
Book Google Scholar
Qiao Z, Zhou L, Huang J(2009) Sparse linear discriminant analysiswith applications to high dimensional low sample size data. Int J Appl Math. pp 48-60.
Qiao L, Chen S, Tan X (2010) Sparsity preserving projections with applications to face recognition. Pattern Recogn 43:331–341
Article Google Scholar
A. Sellami, M. Farah, I. Farah, et al.(2017) Hyperspectral imagery classification based on semi-supervised 3-D deep neural network and adaptive band selection, expert Syst. Pp 246–259 .
Sharifzadeh F, Akbarizadeh G, Kavian Y (2019) Ship classification in SAR images using a new hybrid CNN-MLP classifier. J Indian Soc Remote Sens 47:551–562
Article Google Scholar
Taibi F, Akbarizadeh G, Farshidi E (2019) Robust reservoir rock fracture recognition based on a new sparse feature learning and data training method, Multidimension. Syst. Signal Process. pp 2113–2146 .
Wen J, Fang X, Cui J (2019) Robust sparse linear discriminant analysis. IEEE Trans Circuits Syst Video Technol 29:390–403
Article Google Scholar
Liu Z, Lu Y, Lai Z, Ou W, Zhang K (2021) Robust sparse low-rank embedding for image reduction. Applied Soft Computing 113:20211129
Yang W, Wang Z, Sun C (2015) A collaborative representation based projections method for feature extraction, Pattern Recognit. pp 20–27.
Ye J, Xiong T(2006) Null space versus orthogonal linear discriminantanalysis. International conference on machine leaning, pp 1073-1080.
Ye J, Janardan R, Li Q (2006) Feature reduction via generalized uncorrelated linear discriminant analysis. IEEE Trans Knowl Data Eng. pp 1312–1322
Yu W, Zhang M, Shen Y(2019) Learning a local manifold representation based on improved neighborhood rough set and LLE for hyperspectral dimensionality reduction, signal process. Pp 20-29.
Zeng Z, Wang X, Yan F(2019) Local adaptive learning for semi-supervised feature selection with group sparsity, Knowl-Based Syst. pp 181.
Zhang A (2020) Channel estimation for MmWave massive MIMO with hybrid precoding based on log-sum sparse constraints. Transactions on Image Processing, IEEE
Google Scholar
Zhang A (2020) Block-sparsity log-sum-induced adaptive filter for cluster sparse system identification. Access, IEEE
Book Google Scholar
Zhang A (2020) Reweighted l p constraint LMS-based adaptive sparse channel estimation for cooperative communication system. IET Communications, IEEE Access
Book Google Scholar
Zhang Z, Wang J, Zha H (2012) Adaptive manifold learning. IEEE Trans Pattern AnalMach Intell 34:253–265
Article Google Scholar
Zhang X, Chu D, Tan R(2016) Sparse uncorrelated linear discriminant analysis for undersampled problems, IEEE Trans Neural Netw Learn Syst, pp 1469-1485.
Zhang J, Luo Z, Li C(2019) Manifold regularized discriminative feature selection for multi-label learning, pattern Recognit. Pp 136-150.
Zhang L, Liu Z, Pu J (2020) Adaptive graph regularized nonnegative matrix factorization for data representation. Appl Intell 50(2):438–447
Article Google Scholar
Zhou T, Peng Y (2020) Kernel principal component analysis-based Gaussian process regression modelling for high-dimensional reliability analysis. Computers and Structures. pp. 241
Zhou Y, Sun S (2017) Manifold partition discriminant analysis. IEEE Trans Cybern 47:830–840
Article Google Scholar
Zhu X, Ghahramani Z, Lafferty J (2013) Semisupervised learning using gaussian fields and harmonic functions. In: Machine learning, proceedings of the twentieth international conference (ICML). Washington, DC, USA, pp 21–24
Google Scholar

Download references

Acknowledgements

This work was supported by Natural Science Foundation of China (U1504610, 61971339, 61471161), the Key Project of the Natural Science Foundation of Shanxi Province (2018JZ6002), Scientific and Technological Innovation Team of Colleges and Universities in Henan Province (20IRTSTHN018), the Doctoral Startup Foundation of Xi’an Polytechnic University (BS1616). the Natural Science Foundations of Henan Province (202300410148).

Author information

Authors and Affiliations

Information Engineering College, Henan University of Science and Technology, Luoyang, China
Jingjing Wang, Zhonghua Liu, Qingtao Wu & Mingchuan Zhang
College of Electronics and Information, Xi’an Polytechnic University, Xi’an, China
Kaibing Zhang

Authors

Jingjing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhonghua Liu
View author publications
You can also search for this author in PubMed Google Scholar
Kaibing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qingtao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Mingchuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhonghua Liu.

Ethics declarations

Conflict of interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, J., Liu, Z., Zhang, K. et al. Robust sparse manifold discriminant analysis. Multimed Tools Appl 81, 20781–20796 (2022). https://doi.org/10.1007/s11042-022-12708-3

Download citation

Received: 01 December 2020
Revised: 04 March 2021
Accepted: 21 February 2022
Published: 12 March 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11042-022-12708-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Robust sparse manifold discriminant analysis

Abstract

Similar content being viewed by others

Sparse L1-norm-based linear discriminant analysis

A supervised discriminant data representation: application to pattern classification

Linear Discriminant Analysis with Maximum Correntropy Criterion

1 Introduction