A Semi-supervised Learning Algorithm Based on Low Rank and Weighted Sparse Graph for Face Recognition

Zhang, Tao; Tang, Zhenmin; Qian, Bin

doi:10.1007/978-3-319-46654-5_14

Tao Zhang²¹,
Zhenmin Tang²¹ &
Bin Qian²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9967))

Included in the following conference series:

Chinese Conference on Biometric Recognition

2854 Accesses
1 Citations

Abstract

Traditional graph-based semi-supervised learning can not capture both the global and local structures of the data exactly. In this paper, we propose a novel low rank and weighted sparse graph. First, we utilize exact low rank representation by the nuclear norm and Forbenius norm to capture the global subspace structure. Meanwhile, we build the weighted sparse regularization term with shape interaction information to capture the local linear structure. Then, we employ the linearized alternating direction method with adaptive penalty to solve the objective function. Finally, the graph is constructed by an effective post-processing method. We evaluate the proposed method by performing semi-supervised classification experiments on ORL, Extended Yale B and AR face database. The experimental results show that our approach improves the accuracy of semi-supervised learning and achieves the state-of-the-art performance.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Structure Preserving Low-Rank Representation for Semi-supervised Face Recognition

Robust Face Recognition Based on Supervised Sparse Representation

Robust automated graph regularized discriminative non-negative matrix factorization

Article 30 January 2021

Keywords

1 Introduction

For many machine learning and pattern recognition applications, it is difficult to get enough labeled samples, while a large number of unlabeled samples are widely available over the Internet. Semi-supervised learning (SSL) can utilize both limited labeled samples and abundant unlabeled samples, which has become a research focus in learning tasks. In the current method, graph-based SSL has attracted much attention because it can effectively capture the structure information hidden in the data and obtain a better performance in the practical application [1].

Graph-based SSL employs a graph to represent data structures, where the set of vertices corresponds to the samples, and the set of edges is associated with an adjacency matrix which measures the pairwise weights between vertices. Label information of the labeled samples can be propagated to the unlabeled samples over the graph by label propagation algorithm, such as local and global consistency (LGC) [2] and Gaussian random field and harmonic function (GHFH) [3]. How to construct a good graph is the difficulty of the algorithms, and it is still an open problem. Liu et al. [4] propose low rank representation (LRR), which constructs a low rank graph by solving the nuclear norm minimization problem. LRR can capture the global structure of the data and performs well on the subspace clustering problem. Zhuang et al. [5] extend LRR and propose non-negative low rank and sparse graph (NNLRS). Compared with LRR, NNLRS adds a sparse constraint in the objective function, and it can capture the global and local structure of the data. In [6, 7], based on NNLRS, the authors propose weighted sparse constraint, where the sparse regularization term is weighted by different weight matrix and it can effectively protect the local structure of the data.

We observe that the above algorithms use the nuclear norm to estimate the rank of the matrix. Nevertheless, the nuclear norm is a convex relaxation of the rank function, and it can not estimate the rank accurately. Choosing a suitable function to estimate the rank can improve the performance of algorithms. Kang et al. [8] propose a rank approximation based on Logarithm-Determinant and it improves the accuracy of subspace clustering. Inspired by elastic net [9] in learning theory, we use both the nuclear norm and Forbenius norm as a replacement function. The rank can be estimated effectively and we can get a more exact LRR. On the other hand, in order to improve the ability to capture the local structure of the data, we also add a weighted sparse regularization term into the objective function. Different from [6, 7], we utilize the shape interaction information to construct weight matrix, which makes the graph contain more information.

The remainder of this paper is organized as follows. We give an overview of the LRR algorithm in Sect. 2. In Sect. 3, we present the proposed low rank and weighted sparse graph (LRWSG) and its optimization by linearized alternating direction method with adaptive penalty (LADMAP) [10]. The experimental results on three widely used face database are presented in Sect. 4. Finally, we conclude this paper in Sect. 5.

2 Related Work

This section briefly introduces LRR. Let $ X = [x_{1} ,x_{2} , \ldots ,x_{n} ] \in {\mathbb{R}}^{d \times n} $ be a matrix whose columns are n data samples in the d dimensional space. LRR seeks the coefficient matrix $ Z = [z_{1} ,z_{2} , \ldots ,z_{n} ] \in {\mathbb{R}}^{n \times n} $ which is the lowest rank representation that can represent $ X $ as a linear combination of itself. The LRR problem is defined as follows:

$$ \mathop {\hbox{min} }\limits_{Z} ||Z||_{*} + \lambda ||E||_{2,1} , { }s.t. \, X = XZ + E. $$

(1)

where $ || \cdot ||_{*} $ is the nuclear norm of a matrix (the sum of the singular values of a matrix). $ ||E||_{2,1} = \sum_{j = 1}^{n} (\sum_{i = 1}^{d} E_{ij}^{2} )^{1/2} $ is 2,1-norm and it is used to represent noise. The parameter $ \lambda $ is used to balance the effect of noise. The inexact augmented Lagrange multiplier (IALM) [11] method is employed to solve the problem (1), and we can get the optimal solution $ \, (Z^{*} ,E^{*} ) $. The adjacency matrix of the low rank graph can be calculated as follows:

$$ G = (|Z^{*} | + |Z^{*} |^{T} )/2 $$

(2)

After we get the adjacency matrix, LGC or GHFH algorithm is used to propagate label information and obtain the results of semi-supervised classification.

3 The Proposed Method

3.1 Problem Formulation

Elastic net which utilizes both the 1-norm and 2-norm as penalty function is an effective model in statistical learning [9]. The 1-norm guarantees the sparsity of the solution, while the 2-norm guarantees the stability of the solution. And the model performs well on the low rank matrix completion problem [12].

We observe that $ ||Z||_{*} $ in Eq. (1) can be represented as $ \sum_{i = 1}^{r} |\sigma_{i} | $, where $ \, \sigma_{i} $ is the ith singular value of $ Z $, r is the rank of $ Z $. Obviously, $ \sum_{i = 1}^{r} |\sigma_{i} | $ is the 1-norm penalty of the singular value of $ Z $. To improve the stability of the algorithm, we introduce $ \sum_{i = 1}^{r} |\sigma_{i} |^{2} $ as a 2-norm penalty of the singular value of $ Z $. Actually, $ ||Z||_{F}^{2} = Tr(V\varvec{\varLambda}U^{T} U\varvec{\varLambda}V^{T} ) = Tr(\varvec{\varLambda}^{2} ) = \sum_{i = 1}^{r} |\sigma_{i} |^{2} $, where $ Z = U\varvec{\varLambda}V^{T} $ is SVD of $ Z $. By combining the 1-norm penalty and 2-norm penalty, we can rewrite Eq. (1) as follows:

$$ \mathop {\hbox{min} }\limits_{Z} ||Z||_{*} + \, \alpha ||Z||_{F}^{2} + \lambda ||E||_{2,1} , { }s.t. \, X = XZ + E. $$

(3)

where the parameter α is used to trade off the effect of 1-norm penalty and 2-norm penalty. Compared with Eqs. (1), (3) is a stable model which can estimate the rank of $ Z $ and capture the global subspace structure more exactly.

In order to capture the local linear structure,$ ||Z||_{1} $ is added into Eq. (1) [5]. Later on, [6, 7] propose weighted sparse constraint $ ||W \odot Z||_{1} $, where $ \odot $ denotes the Hadamard product, if $ M = A \odot B $, then $ M_{ij} = A_{ij} \times B_{ij} $. Constructing a weight matrix $ W $ with more information can protect the local structure of the data. Inspired by [13], we utilize shape interaction information to construct $ W $. Let $ X = U_{r}\varvec{\varLambda}_{r} V_{r}^{T} $ be the skinny SVD of $ X $, where r is the rank of $ X $. The shape interaction representation of each data sample $ x_{i} $ is $ R_{i} =\varvec{\varLambda}_{r}^{ - 1} U_{r}^{T} x_{i} $. Normalize all column vectors of $ R_{i} $ by $ R_{i}^{*} = R_{i} /||R_{i} ||_{2} $, and the shape interaction weight matrix can be defined as follows:

(4)

In summary, we formulate the objective function of LRWSG as follows:

(5)

3.2 Optimization

Similar to [5], we utilize LADMAP to solve problem (5). We first introduce an auxiliary variable $ J $ to separate the variable in the objective function. Thus Eq. (5) can be rewritten as follows:

(6)

The augmented Lagrange function of Eq. (6) is

(7)

where $ Y_{1} $ and $ Y_{2} $ are Lagrange multipliers, $ \mu > 0 $ is a penalty parameter.

Update $ Z_{k + 1} $ with $ Z_{k} $, $ J_{k} $, $ E_{k} $ fixed.

$$ \begin{aligned} Z_{k + 1} & = \mathop {\arg \hbox{min} }\limits_{Z} \frac{1}{{\eta \mu_{k} }}||Z||_{*} + \frac{1}{2}||Z - (Z_{k} + (X^{T} (X - XZ_{k} - E + Y_{1,k} /\mu_{k} ) \\ & - (Z_{k} - J_{k} + Y_{2,k} /\mu_{k} ) - (2\alpha /\mu )Z_{k} )/\eta )||_{F}^{2} \\ \end{aligned} $$

(8)

where $ \eta = ||X||_{2}^{2} $, Eq. (8) can be solved by singular value thresholding operator [14]. We set $ A = (Z_{k} + (X^{T} (X - XZ_{k} - E + Y_{1,k} /\mu_{k} ) - (Z_{k} - J_{k} + Y_{2,k} /\mu_{k} ) - (2\alpha /\mu )Z_{k} )/\eta $, $ A = U\varvec{\varLambda}V^{T} $ is the SVD of $ A $, the solution of Eq. (8) is $ Z_{k + 1} = US_{1/\beta } \left(\varvec{\varLambda}\right)V^{T} $, where $ S $ is soft thresholding operator [11], defined as $ S_{\varepsilon } [x] = \hbox{max} (x - \varepsilon ,0) + \hbox{min} (x + \varepsilon ,0) $.

Update $ J_{k + 1} $ with $ Z_{k + 1} $, $ J_{k} $, $ E_{k} $ fixed.

(9)

Equation (9) can be solved by soft thresholding operator and the solution is

$$ (J_{k + 1} )_{ij} = \hbox{max} (S_{{\varepsilon_{ij} }} [(Z_{k + 1} )_{ij} + (Y_{2,k} )_{ij} /\mu_{k} ],0) $$

(10)

where .

Update $ E_{k + 1} $ with $ Z_{k + 1} $, $ J_{k + 1} $, $ E_{k} $ fixed.

$$ E_{k + 1} = \mathop {\arg \hbox{min} }\limits_{E} \frac{\lambda }{{\mu_{k} }}||E||_{2,1} + \frac{1}{2}||E - (X - XZ_{k + 1} + Y_{1,k} /\mu_{k} )||_{F}^{2} $$

(11)

The solution of Eq. (11) is

$$ E_{k + 1} = \Omega_{{\lambda /\mu_{k} }} (X - XZ_{k + 1} + Y_{1,k} /\mu_{k} ) $$

(12)

where $ \Omega $ is 2,1-norm minimization operator [4]. If $ Y = \Omega_{\varepsilon } (X) $, then the ith column of $ Y $ is

$$Y(:,i) = \left\{ {\begin{array}{ll} {\frac{{||X(:,i)||_{2} -\varepsilon }}{{||X(:,i)||_{2} }}X(:,i)}, & \varepsilon < ||X(:,i)||_{2} \\ 0, & \varepsilon \ge ||X(:,i)||_{2} \\ \end{array} } \right. $$

(13)

The complete optimization to LRWSG is summarized in Algorithm 1.

3.3 Graph Construction

Once problem (5) is solved, we can obtain an optimal $ Z^{*} $. Different from the traditional graph-based SSL construct the adjacency matrix by Eq. (2), we utilize the method which is used on the subspace clustering problem [4]. Let $ Z^{*} = U^{*}\varvec{\varLambda}^{*} \left( {V^{*} } \right)^{T} $ be the skinny SVD of $ Z^{*} $, we define $ P = U^{*} \left( {\varvec{\varLambda}^{*} } \right)^{1/2} $, the adjacency matrix of LRWSG is calculated as follows:

$$ (G)_{ij} = (PP^{T} )_{ij}^{2} $$

(14)

After we obtain the adjacency matrix $ G $, LGC algorithm is employed to solve the semi-supervised classification problem.

4 Experiment

In this section, we evaluate the effectiveness of LRWSG on semi-supervised classification experiments. LRWSG is compared with several LRR related graphs including LRR [4], NNLRS [5], LRRLC [6], SCLRR [7] and CLAR [8]. The classification accuracy is used to evaluate the semi-supervised classification performance, which is defined to be the percentage of correctly classified samples versus the test samples. The parameters of the compared algorithms are tuned to achieve the best performance. In LRWSG, the parameter $ \alpha $ balances the effect of nuclear norm and Forbenius norm. According to a large number of experiments, the classification accuracy is better when we set $ \alpha = 1 $. The parameter $ \lambda $ is used to describe the noise of data, we set $ \lambda = 10 $ for our experiments. The parameter $ \beta $ controls the effect of sparse regularization term, we set $ \beta = 0.3 $ on ORL database and EYaleB database and we set $ \beta = 0.1 $ on AR database. The experiments are implemented on Intel Core i7 4710MQ CPU with 8 G memory.

4.1 Databases

We select three face databases for our experiments: ORL, Extended Yale B (EYaleB) and AR. The ORL database contains 40 distinct subjects, and each subject has 10 different images. The images were taken at different times, varying the lighting, facial expressions and facial details. There are 64 face images under different illuminations of each of 38 individuals in the EYaleB database. In our experiments, we use the first 20 subjects and each subject chooses the first 50 images. The AR database contains 3120 images of 120 subjects with different facial expressions, lighting conditions and occlusions. The first 50 subjects are chosen for our experiments, and each subject chooses the first 20 images. All the images are resized to $ 32 \times 32 $. Several sample images of the three face databases are shown in Fig. 1.

4.2 Experimental Results and Analysis

For each database, we randomly choose 10 % to 60 % samples from each class as labeled samples, and the rest samples are used for testing. For each percentage of labeled samples, we repeat the experiment 20 trials for each algorithm. Tables 1, 2 and 3 report the classification accuracies and standard deviations of each algorithm on ORL, EYaleB and AR.

Table 1. The classification accuracies and standard deviations (%) on ORL

Full size table

Table 2. The classification accuracies and standard deviations (%) on EYaleB

Full size table

Table 3. The classification accuracies and standard deviations (%) on AR

Full size table

From the results, we can observe that:

(1) LRWSG achieves the highest accuracies on both databases. LRWSG utilizes the nuclear norm and Forbenius norm to estimate the rank function. Meanwhile, the weighted sparse regularization term with shape interaction information is joined into the objective function. Therefore, LRWSG can capture both the global subspace structure and local linear structure exactly. And the standard deviations of LRWSG are often small, which shows the stability of LRWSG.

(2) CLAR uses the logarithm determinant function to estimate the rank function, which improves the performance of LRR. But compared with the algorithms which consider both low rank and sparse property, CLAR performs worse. Depend on LRR, NNLRS, LRRLC and SCLRR propose different sparsity constraint. Although the weight matrices of sparse regularization term are different, the performance of these three algorithms are similar. With exact rank estimation and informational weighted sparse matrix, LRWSG performs better than these three algorithms.

(3) With the increase of the number of labeled samples, the classification accuracies of each algorithm are also increased. When given more labeled samples, the label information is more abundant, and each algorithm performs well. When given less labeled samples, the classification becomes more difficult, but LRWSG can still get a higher classification accuracies. For example, with 10 % labeled samples, the accuracy of LRWSG is 82.10 % on ORL database, which is 6.17 % higher than the best result obtained from other algorithms

5 Conclusion

This paper proposes a novel semi-supervised learning algorithm based on low rank and weighted sparse graph (LRWSG), and applies it to face recognition. In order to capture the data structure exactly, LRWSG makes use of the nuclear norm and Forbenius norm to estimate the rank function, and adds a weighted sparse constraint with shape interaction information into the object function. LADMAP is employed to solve the optimization problem. And with an effective post-processing method, the graph is constructed and used for semi-supervised classification. Experimental results on ORL, EYaleB and AR databases show that the proposed approach achieves better classification performance.

References

Zhu, X.: Semi-supervised Learning Literature Survey. Technical report, University of Wisconsin, Madison (2006)
Google Scholar
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems, pp. 321–328. MIT Press, Cambridge (2004)
Google Scholar
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning, pp. 912–919. AAAI Press, Washington, DC (2003)
Google Scholar
Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., Ma, Y.: Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 171–184 (2013)
Article Google Scholar
Zhuang, L., Gao, H., Lin, Z., Ma, Y., Zhang, X., Yu, N.: Non-negative low rank and sparse graph for semi-supervised learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2328–2335. IEEE Press, Providence (2012)
Google Scholar
Zheng, Y., Zhang, X., Yang, S., Jiao, L.: Low-rank representation with local constraint for graph construction. Neurocomputing 122, 398–405 (2013)
Article Google Scholar
Tang, K., Liu, R., Su, Z., Zhang, J.: Structure-constrained low-rank representation. IEEE Trans. Neural Netw. Learn. Systems 25(12), 2167–2179 (2014)
Article Google Scholar
Kang, Z., Peng, C., Cheng, Q.: Robust subspace clustering via smoothed rank approximation. IEEE Signal Process. Lett. 22(11), 2088–2092 (2015)
Article Google Scholar
De Mol, C., De Vito, E., Rosasco, L.: Elastic-net regularization in learning theory. J. Complex. 25(2), 201–230 (2009)
Article MathSciNet MATH Google Scholar
Lin, Z., Liu, R., Su, Z.: Linearized alternating direction method with adaptive penalty for low-rank representation. In: Advances in Neural Information Processing Systems, pp. 612–620. Springer Press, Granada (2011)
Google Scholar
Lin, Z., Chen, M., Ma, Y.: The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. Technical report, UILU-ENG-09-2215 (2009)
Google Scholar
Li, H., Chen, N., Li, L.: Error analysis for matrix elastic-net regularization algorithms. IEEE Trans. Neural Netw. Learn. Syst. 23(5), 737–748 (2012)
Article Google Scholar
Liu, B., Jing, L., Yu, J., Li, J.: Robust graph learning via constrained elastic-net regularization. Neurocomputing 171, 299–312 (2016)
Article Google Scholar
Cai, J.F., Candès, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China
Tao Zhang, Zhenmin Tang & Bin Qian

Authors

Tao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhenmin Tang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Qian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Zhang .

Editor information

Editors and Affiliations

Sichuan University , Chengdu, China
Zhisheng You
Tsinghua University , Beijing, China
Jie Zhou
Beihang University , Beijing, China
Yunhong Wang
Chinese Academy of Sciences , Beijing, China
Zhenan Sun
Chinese Academy of Sciences , Beijing, China
Shiguang Shan
Sun Yat-sen University , Guangzhou, China
Weishi Zheng
Tsinghua University , Beijing, China
Jianjiang Feng
Sichuan University , Chengdu, China
Qijun Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, T., Tang, Z., Qian, B. (2016). A Semi-supervised Learning Algorithm Based on Low Rank and Weighted Sparse Graph for Face Recognition. In: You, Z., et al. Biometric Recognition. CCBR 2016. Lecture Notes in Computer Science(), vol 9967. Springer, Cham. https://doi.org/10.1007/978-3-319-46654-5_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-46654-5_14
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46653-8
Online ISBN: 978-3-319-46654-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Semi-supervised Learning Algorithm Based on Low Rank and Weighted Sparse Graph for Face Recognition

Abstract

Similar content being viewed by others

Structure Preserving Low-Rank Representation for Semi-supervised Face Recognition

Robust Face Recognition Based on Supervised Sparse Representation

Robust automated graph regularized discriminative non-negative matrix factorization

Keywords

1 Introduction

2 Related Work