Keywords

1 Introduction

For many machine learning and pattern recognition applications, it is difficult to get enough labeled samples, while a large number of unlabeled samples are widely available over the Internet. Semi-supervised learning (SSL) can utilize both limited labeled samples and abundant unlabeled samples, which has become a research focus in learning tasks. In the current method, graph-based SSL has attracted much attention because it can effectively capture the structure information hidden in the data and obtain a better performance in the practical application [1].

Graph-based SSL employs a graph to represent data structures, where the set of vertices corresponds to the samples, and the set of edges is associated with an adjacency matrix which measures the pairwise weights between vertices. Label information of the labeled samples can be propagated to the unlabeled samples over the graph by label propagation algorithm, such as local and global consistency (LGC) [2] and Gaussian random field and harmonic function (GHFH) [3]. How to construct a good graph is the difficulty of the algorithms, and it is still an open problem. Liu et al. [4] propose low rank representation (LRR), which constructs a low rank graph by solving the nuclear norm minimization problem. LRR can capture the global structure of the data and performs well on the subspace clustering problem. Zhuang et al. [5] extend LRR and propose non-negative low rank and sparse graph (NNLRS). Compared with LRR, NNLRS adds a sparse constraint in the objective function, and it can capture the global and local structure of the data. In [6, 7], based on NNLRS, the authors propose weighted sparse constraint, where the sparse regularization term is weighted by different weight matrix and it can effectively protect the local structure of the data.

We observe that the above algorithms use the nuclear norm to estimate the rank of the matrix. Nevertheless, the nuclear norm is a convex relaxation of the rank function, and it can not estimate the rank accurately. Choosing a suitable function to estimate the rank can improve the performance of algorithms. Kang et al. [8] propose a rank approximation based on Logarithm-Determinant and it improves the accuracy of subspace clustering. Inspired by elastic net [9] in learning theory, we use both the nuclear norm and Forbenius norm as a replacement function. The rank can be estimated effectively and we can get a more exact LRR. On the other hand, in order to improve the ability to capture the local structure of the data, we also add a weighted sparse regularization term into the objective function. Different from [6, 7], we utilize the shape interaction information to construct weight matrix, which makes the graph contain more information.

The remainder of this paper is organized as follows. We give an overview of the LRR algorithm in Sect. 2. In Sect. 3, we present the proposed low rank and weighted sparse graph (LRWSG) and its optimization by linearized alternating direction method with adaptive penalty (LADMAP) [10]. The experimental results on three widely used face database are presented in Sect. 4. Finally, we conclude this paper in Sect. 5.

2 Related Work

This section briefly introduces LRR. Let \( X = [x_{1} ,x_{2} , \ldots ,x_{n} ] \in {\mathbb{R}}^{d \times n} \) be a matrix whose columns are n data samples in the d dimensional space. LRR seeks the coefficient matrix \( Z = [z_{1} ,z_{2} , \ldots ,z_{n} ] \in {\mathbb{R}}^{n \times n} \) which is the lowest rank representation that can represent \( X \) as a linear combination of itself. The LRR problem is defined as follows:

$$ \mathop {\hbox{min} }\limits_{Z} ||Z||_{*} + \lambda ||E||_{2,1} , { }s.t. \, X = XZ + E. $$
(1)

where \( || \cdot ||_{*} \) is the nuclear norm of a matrix (the sum of the singular values of a matrix). \( ||E||_{2,1} = \sum_{j = 1}^{n} (\sum_{i = 1}^{d} E_{ij}^{2} )^{1/2} \) is 2,1-norm and it is used to represent noise. The parameter \( \lambda \) is used to balance the effect of noise. The inexact augmented Lagrange multiplier (IALM) [11] method is employed to solve the problem (1), and we can get the optimal solution \( \, (Z^{*} ,E^{*} ) \). The adjacency matrix of the low rank graph can be calculated as follows:

$$ G = (|Z^{*} | + |Z^{*} |^{T} )/2 $$
(2)

After we get the adjacency matrix, LGC or GHFH algorithm is used to propagate label information and obtain the results of semi-supervised classification.

3 The Proposed Method

3.1 Problem Formulation

Elastic net which utilizes both the 1-norm and 2-norm as penalty function is an effective model in statistical learning [9]. The 1-norm guarantees the sparsity of the solution, while the 2-norm guarantees the stability of the solution. And the model performs well on the low rank matrix completion problem [12].

We observe that \( ||Z||_{*} \) in Eq. (1) can be represented as \( \sum_{i = 1}^{r} |\sigma_{i} | \), where \( \, \sigma_{i} \) is the ith singular value of \( Z \), r is the rank of \( Z \). Obviously, \( \sum_{i = 1}^{r} |\sigma_{i} | \) is the 1-norm penalty of the singular value of \( Z \). To improve the stability of the algorithm, we introduce \( \sum_{i = 1}^{r} |\sigma_{i} |^{2} \) as a 2-norm penalty of the singular value of \( Z \). Actually, \( ||Z||_{F}^{2} = Tr(V\varvec{\varLambda}U^{T} U\varvec{\varLambda}V^{T} ) = Tr(\varvec{\varLambda}^{2} ) = \sum_{i = 1}^{r} |\sigma_{i} |^{2} \), where \( Z = U\varvec{\varLambda}V^{T} \) is SVD of \( Z \). By combining the 1-norm penalty and 2-norm penalty, we can rewrite Eq. (1) as follows:

$$ \mathop {\hbox{min} }\limits_{Z} ||Z||_{*} + \, \alpha ||Z||_{F}^{2} + \lambda ||E||_{2,1} , { }s.t. \, X = XZ + E. $$
(3)

where the parameter α is used to trade off the effect of 1-norm penalty and 2-norm penalty. Compared with Eqs. (1), (3) is a stable model which can estimate the rank of \( Z \) and capture the global subspace structure more exactly.

In order to capture the local linear structure,\( ||Z||_{1} \) is added into Eq. (1) [5]. Later on, [6, 7] propose weighted sparse constraint \( ||W \odot Z||_{1} \), where \( \odot \) denotes the Hadamard product, if \( M = A \odot B \), then \( M_{ij} = A_{ij} \times B_{ij} \). Constructing a weight matrix \( W \) with more information can protect the local structure of the data. Inspired by [13], we utilize shape interaction information to construct \( W \). Let \( X = U_{r}\varvec{\varLambda}_{r} V_{r}^{T} \) be the skinny SVD of \( X \), where r is the rank of \( X \). The shape interaction representation of each data sample \( x_{i} \) is \( R_{i} =\varvec{\varLambda}_{r}^{ - 1} U_{r}^{T} x_{i} \). Normalize all column vectors of \( R_{i} \) by \( R_{i}^{*} = R_{i} /||R_{i} ||_{2} \), and the shape interaction weight matrix can be defined as follows:

(4)

In summary, we formulate the objective function of LRWSG as follows:

(5)

3.2 Optimization

Similar to [5], we utilize LADMAP to solve problem (5). We first introduce an auxiliary variable \( J \) to separate the variable in the objective function. Thus Eq. (5) can be rewritten as follows:

(6)

The augmented Lagrange function of Eq. (6) is

(7)

where \( Y_{1} \) and \( Y_{2} \) are Lagrange multipliers, \( \mu > 0 \) is a penalty parameter.

Update \( Z_{k + 1} \) with \( Z_{k} \), \( J_{k} \), \( E_{k} \) fixed.

$$ \begin{aligned} Z_{k + 1} & = \mathop {\arg \hbox{min} }\limits_{Z} \frac{1}{{\eta \mu_{k} }}||Z||_{*} + \frac{1}{2}||Z - (Z_{k} + (X^{T} (X - XZ_{k} - E + Y_{1,k} /\mu_{k} ) \\ & - (Z_{k} - J_{k} + Y_{2,k} /\mu_{k} ) - (2\alpha /\mu )Z_{k} )/\eta )||_{F}^{2} \\ \end{aligned} $$
(8)

where \( \eta = ||X||_{2}^{2} \), Eq. (8) can be solved by singular value thresholding operator [14]. We set \( A = (Z_{k} + (X^{T} (X - XZ_{k} - E + Y_{1,k} /\mu_{k} ) - (Z_{k} - J_{k} + Y_{2,k} /\mu_{k} ) - (2\alpha /\mu )Z_{k} )/\eta \), \( A = U\varvec{\varLambda}V^{T} \) is the SVD of \( A \), the solution of Eq. (8) is \( Z_{k + 1} = US_{1/\beta } \left(\varvec{\varLambda}\right)V^{T} \), where \( S \) is soft thresholding operator [11], defined as \( S_{\varepsilon } [x] = \hbox{max} (x - \varepsilon ,0) + \hbox{min} (x + \varepsilon ,0) \).

Update \( J_{k + 1} \) with \( Z_{k + 1} \), \( J_{k} \), \( E_{k} \) fixed.

(9)

Equation (9) can be solved by soft thresholding operator and the solution is

$$ (J_{k + 1} )_{ij} = \hbox{max} (S_{{\varepsilon_{ij} }} [(Z_{k + 1} )_{ij} + (Y_{2,k} )_{ij} /\mu_{k} ],0) $$
(10)

where .

Update \( E_{k + 1} \) with \( Z_{k + 1} \), \( J_{k + 1} \), \( E_{k} \) fixed.

$$ E_{k + 1} = \mathop {\arg \hbox{min} }\limits_{E} \frac{\lambda }{{\mu_{k} }}||E||_{2,1} + \frac{1}{2}||E - (X - XZ_{k + 1} + Y_{1,k} /\mu_{k} )||_{F}^{2} $$
(11)

The solution of Eq. (11) is

$$ E_{k + 1} = \Omega_{{\lambda /\mu_{k} }} (X - XZ_{k + 1} + Y_{1,k} /\mu_{k} ) $$
(12)

where \( \Omega \) is 2,1-norm minimization operator [4]. If \( Y = \Omega_{\varepsilon } (X) \), then the ith column of \( Y \) is

$$Y(:,i) = \left\{ {\begin{array}{ll} {\frac{{||X(:,i)||_{2} -\varepsilon }}{{||X(:,i)||_{2} }}X(:,i)}, & \varepsilon < ||X(:,i)||_{2} \\ 0, & \varepsilon \ge ||X(:,i)||_{2} \\ \end{array} } \right. $$
(13)

The complete optimization to LRWSG is summarized in Algorithm 1.

3.3 Graph Construction

Once problem (5) is solved, we can obtain an optimal \( Z^{*} \). Different from the traditional graph-based SSL construct the adjacency matrix by Eq. (2), we utilize the method which is used on the subspace clustering problem [4]. Let \( Z^{*} = U^{*}\varvec{\varLambda}^{*} \left( {V^{*} } \right)^{T} \) be the skinny SVD of \( Z^{*} \), we define \( P = U^{*} \left( {\varvec{\varLambda}^{*} } \right)^{1/2} \), the adjacency matrix of LRWSG is calculated as follows:

$$ (G)_{ij} = (PP^{T} )_{ij}^{2} $$
(14)

After we obtain the adjacency matrix \( G \), LGC algorithm is employed to solve the semi-supervised classification problem.

4 Experiment

In this section, we evaluate the effectiveness of LRWSG on semi-supervised classification experiments. LRWSG is compared with several LRR related graphs including LRR [4], NNLRS [5], LRRLC [6], SCLRR [7] and CLAR [8]. The classification accuracy is used to evaluate the semi-supervised classification performance, which is defined to be the percentage of correctly classified samples versus the test samples. The parameters of the compared algorithms are tuned to achieve the best performance. In LRWSG, the parameter \( \alpha \) balances the effect of nuclear norm and Forbenius norm. According to a large number of experiments, the classification accuracy is better when we set \( \alpha = 1 \). The parameter \( \lambda \) is used to describe the noise of data, we set \( \lambda = 10 \) for our experiments. The parameter \( \beta \) controls the effect of sparse regularization term, we set \( \beta = 0.3 \) on ORL database and EYaleB database and we set \( \beta = 0.1 \) on AR database. The experiments are implemented on Intel Core i7 4710MQ CPU with 8 G memory.

4.1 Databases

We select three face databases for our experiments: ORL, Extended Yale B (EYaleB) and AR. The ORL database contains 40 distinct subjects, and each subject has 10 different images. The images were taken at different times, varying the lighting, facial expressions and facial details. There are 64 face images under different illuminations of each of 38 individuals in the EYaleB database. In our experiments, we use the first 20 subjects and each subject chooses the first 50 images. The AR database contains 3120 images of 120 subjects with different facial expressions, lighting conditions and occlusions. The first 50 subjects are chosen for our experiments, and each subject chooses the first 20 images. All the images are resized to \( 32 \times 32 \). Several sample images of the three face databases are shown in Fig. 1.

Fig. 1.
figure 1

Some sample images from three databases: (a) ORL, (b) EYaleB, (c) AR

4.2 Experimental Results and Analysis

For each database, we randomly choose 10 % to 60 % samples from each class as labeled samples, and the rest samples are used for testing. For each percentage of labeled samples, we repeat the experiment 20 trials for each algorithm. Tables 1, 2 and 3 report the classification accuracies and standard deviations of each algorithm on ORL, EYaleB and AR.

Table 1. The classification accuracies and standard deviations (%) on ORL
Table 2. The classification accuracies and standard deviations (%) on EYaleB
Table 3. The classification accuracies and standard deviations (%) on AR

From the results, we can observe that:

(1) LRWSG achieves the highest accuracies on both databases. LRWSG utilizes the nuclear norm and Forbenius norm to estimate the rank function. Meanwhile, the weighted sparse regularization term with shape interaction information is joined into the objective function. Therefore, LRWSG can capture both the global subspace structure and local linear structure exactly. And the standard deviations of LRWSG are often small, which shows the stability of LRWSG.

(2) CLAR uses the logarithm determinant function to estimate the rank function, which improves the performance of LRR. But compared with the algorithms which consider both low rank and sparse property, CLAR performs worse. Depend on LRR, NNLRS, LRRLC and SCLRR propose different sparsity constraint. Although the weight matrices of sparse regularization term are different, the performance of these three algorithms are similar. With exact rank estimation and informational weighted sparse matrix, LRWSG performs better than these three algorithms.

(3) With the increase of the number of labeled samples, the classification accuracies of each algorithm are also increased. When given more labeled samples, the label information is more abundant, and each algorithm performs well. When given less labeled samples, the classification becomes more difficult, but LRWSG can still get a higher classification accuracies. For example, with 10 % labeled samples, the accuracy of LRWSG is 82.10 % on ORL database, which is 6.17 % higher than the best result obtained from other algorithms

5 Conclusion

This paper proposes a novel semi-supervised learning algorithm based on low rank and weighted sparse graph (LRWSG), and applies it to face recognition. In order to capture the data structure exactly, LRWSG makes use of the nuclear norm and Forbenius norm to estimate the rank function, and adds a weighted sparse constraint with shape interaction information into the object function. LADMAP is employed to solve the optimization problem. And with an effective post-processing method, the graph is constructed and used for semi-supervised classification. Experimental results on ORL, EYaleB and AR databases show that the proposed approach achieves better classification performance.