Graph Embedding Discriminant Analysis and Semi-Supervised Extension for Face Recognition

Du, Jiangshan; Zheng, Liping; Shi, Jun

doi:10.1007/978-981-33-6033-4_5

Jiangshan Du⁸,
Liping Zheng⁸ &
Jun Shi⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1314))

Included in the following conference series:

Chinese Conference on Image and Graphics Technologies

1721 Accesses

Abstract

Considering graph embedding based dimension reduction methods are easily affected by outliers and the fact that there exist insufficient labeled samples in practical applications and thus degrades the discriminant performance, a novel graph embedding discriminant analysis method and its semi-supervised extension for face recognition is proposed in this paper. It uses the intraclass samples and their within-class mean to construct the intrinsic graph for describing the intraclass compactness, which can effectively avoid the influence of outliers to the within-class scatter. Meanwhile, we build the penalty graph through the local information of interclass samples to characterize the interclass separability, which considers the different contributions of interclass samples within the neighborhood to the between-class scatter. On the other hand, we apply the low-rank representation with sparsity constraint for semi-supervised learning, aiming to explore the global low-rank relationships of unlabeled samples. Experimental results on ORL, AR and FERET face datasets demonstrate the effectiveness and robustness of our method.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Learning Robust Weighted Group Sparse Graph for Discriminant Visual Analysis

Article 03 March 2018

Feature extraction based on graph discriminant embedding and its applications to face recognition

Article 22 June 2018

Adaptive graph orthogonal discriminant embedding: an improved graph embedding method

Article 28 February 2018

Keywords

1 Introduction

Over the past decades, dimension reduction has been one research hotspot of face recognition, which aims to discover a low-dimensional representation from original high-dimensional data and thus capture the intrinsic structure of data and finally contribute to classification. Principal Component Analysis (PCA) [1] and Linear Discriminant Analysis (LDA) [2] are the most classic methods. PCA seeks a set of optimal projection directions to maximize the data covariance in an unsupervised way, while LDA uses the label information to construct the within-class scatter and between-class scatter and thus finds the discriminant projection directions. Therefore, LDA is better than PCA in face recognition and other recognition tasks. However, both of them fail to reveal the local structure due to their global linearity.

In order to capture the local information, the manifold learning-based methods have been proposed, which generally assume that the closely located samples are likely to have the similar low-dimensional manifold embeddings. As the most representative manifold learning method, Locality Preserving Projections (LPP) [3] applies k-nearest-neighbor to explore the local structure of each sample. In contrast to LPP, Marginal Fisher Analysis (MFA) [4] uses k₁ nearest intraclass samples to build the intrinsic graph and simultaneously employs k₂ nearest interclass marginal samples to construct the penalty graph. Both of these two graphs can describe the intraclass compactness and the interclass separability respectively. It verifies that the manifold learning-based dimensionality reduction methods can be well explained by graph embedding framework [4]. However, compared with the k₁ nearest intraclass samples and k₂ nearest interclass samples, the distant intraclass and interclass samples are not considered during the process of graph construction. Therefore, it may cause the distant intraclass samples are far from each other and the distant interclass samples are close to each other in the projected feature space, which inevitably goes against classification. To address this problem, Huang et al. proposed the Graph Discriminant Embedding (GDE) [5] to focus on the importance of the distant intraclass and interclass samples in the graph embedding framework. It uses a strictly monotone decreasing function to describe the importance of nearby intraclass and interclass samples and simultaneously applies a strictly monotone increasing function to characterize the importance of distant intraclass and interclass samples. Consequently, the different contributions of nearby and distant samples to the within-class and between-class scatters are fully considered. For the most graph embedding methods, the importance of samples is directly based on the similarity between two samples. Therefore, outliers have the influence to the graph construction and especially affect the intraclass compactness.

On the other hand, considering there exist unlabeled samples in practical application, Cai et al. [6] proposed the Semi-Supervised Discriminant Analysis (SDA) which performs LDA on labeled samples and uses unlabeled samples to build k-nearest-neighbor graph for preserving local neighbor relationships. It effectively applies semi-supervised learning to achieve low dimensional embedding. However, the neighbor parameter k is difficult to define. Inappropriate k even influences the quality of graph construction and thus degrades the semi-supervised discriminant performance. In the past few years, Low Rank Representation (LRR) [7] has been proposed to capture the global structure of samples using low-rank and sparse matrix decomposition. The decomposed low-rank part reflects the intrinsic affinity of samples. In other words, the similar samples will be clustered into the same subspace. However, LRR is essentially unconstrained and is likely to be sensitive to noise for the corrupted data. Therefore, Guo et al. [8] performed the L₁-norm sparsity constraint on low-rank decomposition with the aiming of obtaining the more robust low-rank relationships.

Motivated by the above discussion, we introduce a novel graph embedding discriminant analysis method and its semi-supervised extension for face recognition in this paper. For the labeled samples, we redefine the weights of the intraclass graph through the mean of intraclass samples. In other words, the similarity between two intraclass samples is based on the similarities between them and the mean of intraclass samples, instead of directly using the distance of them. It effectively avoids the influence of outliers to the within-class scatter. Meanwhile we concentrate the different contributions of interclass samples to between-class scatter in view of the idea of GDE. Consequently, we build the improved discriminant criterion to make the intraclass samples as close to each other as possible and interclass samples far from each other. More critically, the constructed graph embedding framework can relieve the influence of outliers to the intraclass compactness. On the other hand, considering there may exist a large number of unlabeled samples in practical application, we apply the low rank representation constrained by L₁-norm sparsity to capture their global low-rank relationships, and then minimizes the error function through preserving the low-rank structure of samples. Finally, we integrate the improved graph embedding discriminant criterion and LRR error function for achieving semi-supervised learning. Experimental results on ORL, AR and FERET face datasets demonstrate the effectiveness and robustness of our method.

The remainder of this paper is organized as follows. Section 2 briefly reviews MFA and GDE. Section 3 introduces the proposed method. Section 4 shows the experimental results and the conclusion is given in Sect. 5.

2 Related Works

Given a set of n training samples X = [x₁, x₂, …, x_n], containing C classes: x_i ∈ l_c, c ∈ {1, 2, …, C}. By using the projection matrix A, the representation of the data set X in the low-dimensional space can be represented as y_i= A^Tx_i. In this section, we will briefly review MFA and GDE.

2.1 Marginal Fisher Analysis (MFA)

As the most representative graph embedding method, MFA [4] builds the intrinsic graph to characterize the compactness of intraclass samples, and constructs the penalty graph to describe the separability of interclass samples. Let G_intrinsic= {V, E, W^MFA} denotes the intrinsic graph, and G_penalty= {V, E, B^MFA} denotes the penalty graph, where V is the samples, and E is the set of edges connecting samples. W^MFA and B^MFA are two adjacency matrices identifying the similarity between samples. The weights of these two matrices are defined [4]:

$$ {\mathbf{W}}_{ij}^{MFA} { = }\left\{ {\begin{array}{*{20}l} { 1 ,} \hfill & {{\text{if}}\;{\mathbf{x}}_{i} \in N_{{k_{1} }}^{ + } (j)\;\;{\text{or}}\;\;{\mathbf{x}}_{j} \in N_{{k_{1} }}^{ + } (i)} \hfill \\ { 0 ,} \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right. $$

(1)

$$ {\mathbf{B}}_{ij}^{MFA} { = }\left\{ {\begin{array}{*{20}l} { 1 ,} \hfill & {{\text{if}}\;\;{\mathbf{x}}_{i} \in P_{{k_{2} }} (j)\;\;{\text{or}}\;\;{\mathbf{x}}_{j} \in P_{{k_{2} }} (i)} \hfill \\ { 0 ,} \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right. $$

(2)

where $ N_{{k_{1} }}^{ + } (i) $ indicates the k₁ nearest intraclass samples of x_i, and $ P_{{k_{2} }} (i) $ indicates the k₂ nearest interclass marginal samples of x_i. The intraclass scatter $ {\mathbf{S}}_{w}^{MFA} $ and interclass scatter $ {\mathbf{S}}_{b}^{MFA} $ can be formulated [4]:

$$ {\mathbf{S}}_{w}^{MFA} = \sum\nolimits_{ij} { | |{\mathbf{y}}_{i} - {\mathbf{y}}_{j} | |^{ 2} {\mathbf{W}}_{ij}^{MFA} } = 2tr\left( {{\mathbf{A}}^{\text{T}} {\mathbf{XL}}^{MFA} {\mathbf{X}}^{\text{T}} {\mathbf{A}}} \right) $$

(3)

$$ {\mathbf{S}}_{b}^{MFA} = \sum\nolimits_{ij} { | |{\mathbf{y}}_{i} - {\mathbf{y}}_{j} | |^{ 2} {\mathbf{B}}_{ij}^{MFA} } = 2tr\left( {{\mathbf{A}}^{\text{T}} {\mathbf{XL}}^{{MFA^{\prime}}} {\mathbf{X}}^{\text{T}} {\mathbf{A}}} \right) $$

(4)

where $ {\mathbf{L}}^{MFA} = {\mathbf{D}}^{MFA} - {\mathbf{W}}^{MFA} $ and $ {\mathbf{L}}^{{MFA^{\prime}}} = {\mathbf{D}}^{{MFA^{\prime}}} - {\mathbf{B}}^{MFA} $ are the Laplacian matrices; $ {\mathbf{D}}_{ii}^{MFA} = \sum\nolimits_{j} {{\mathbf{W}}_{ij}^{MFA} } $ and $ {\mathbf{D}}_{ii}^{{MFA^{\prime}}} { = }\sum\nolimits_{j} {{\mathbf{B}}_{ij}^{MFA} } $ are the diagonal matrices. The discriminant criterion function of MFA can be expressed as [4]:

$$ J_{MFA} ({\mathbf{A}} )= \mathop { \hbox{min} }\limits_{{\mathbf{A}}} \;\frac{{{\mathbf{S}}_{w}^{MFA} }}{{{\mathbf{S}}_{b}^{MFA} }} = \mathop { \hbox{min} }\limits_{{\mathbf{A}}} \;\frac{{tr ({\mathbf{A}}^{\text{T}} {\mathbf{XL}}^{MFA} {\mathbf{X}}^{\text{T}} {\mathbf{A}} )}}{{tr ({\mathbf{A}}^{\text{T}} {\mathbf{XL}}^{{MFA^{\prime}}} {\mathbf{X}}^{\text{T}} {\mathbf{A}} )}} $$

(5)

2.2 Graph Discriminant Embedding (GDE)

The purpose of MFA is to construct two graphs by comprehensively considering the local structure and discriminant structure of samples. However, MFA neglects the importance of the distant samples which may influence the discriminant performance. Therefore, GDE redefines the intrinsic graph G_intrinsic and the penalty graph G_penalty. The weight matrices W^GDE and B^GDE in these two graphs are redefined [5]:

$$ {\mathbf{W}}_{ij}^{GDE} = \left\{ {\begin{array}{*{20}l} {e^{{ - \frac{{ | |{\mathbf{x}}_{i} - {\mathbf{x}}_{j} | |^{ 2} }}{t}}} ,} \hfill & {{\text{if}}\;\;{\mathbf{x}}_{j} \in N_{k1}^{ + } (i) \, \;{\text{or}}\; \, {\mathbf{x}}_{i} \in N_{k1}^{ + } (j),\,l_{i} = l_{j} } \hfill \\ {e^{{ - \frac{t}{{ | |{\mathbf{x}}_{i} - {\mathbf{x}}_{j} | |^{ 2} }}}} ,} \hfill & {{\text{if}}\;\;{\mathbf{x}}_{j} \notin N_{k1}^{ + } (i) \, \;{\text{and}}\; \, {\mathbf{x}}_{i} \notin N_{k1}^{ + } (j),\,l_{i} = l_{j} } \hfill \\ { 0 ,} \hfill & {{\text{if}}\;\;l_{i} \ne l_{j} } \hfill \\ \end{array} } \right. $$

(7)

$$ {\mathbf{B}}_{ij}^{GDE} = \left\{ {\begin{array}{*{20}l} {e^{{ - \frac{{ | |{\mathbf{x}}_{i} - {\mathbf{x}}_{j} | |^{ 2} }}{t}}} ,} \hfill & {{\text{if}}\;\;{\mathbf{x}}_{j} \in P_{k2} (i)\;{\text{or}}\;{\mathbf{x}}_{i} \in P_{k2} (j),\;l_{i} \ne l_{j} } \hfill \\ {e^{{ - \frac{t}{{ | |{\mathbf{x}}_{i} - {\mathbf{x}}_{j} | |^{ 2} }}}} ,} \hfill & {{\text{if}}\;\;{\mathbf{x}}_{j} \notin P_{k2} (i)\;{\text{and}}\;{\mathbf{x}}_{i} \notin P_{k2} (j),\;l_{i} \ne l_{j} } \hfill \\ { 0 ,} \hfill & {{\text{if}}\;\;l_{i} = l_{j} } \hfill \\ \end{array} } \right. $$

(8)

where the parameter t (t > 0) is used to tune the weight of samples. The intraclass scatter $ {\mathbf{S}}_{w}^{GDE} $ and the interclass scatter $ {\mathbf{S}}_{b}^{GDE} $ can be computed [5]:

$$ {\mathbf{S}}_{w}^{GDE} = \sum\nolimits_{ij} { | |{\mathbf{y}}_{i} - {\mathbf{y}}_{j} | |^{ 2} {\mathbf{W}}_{ij}^{GDE} } = 2tr\left( {{\mathbf{A}}^{\text{T}} {\mathbf{XL}}^{GDE} {\mathbf{X}}^{\text{T}} {\mathbf{A}}} \right) $$

(9)

$$ {\mathbf{S}}_{b}^{GDE} = \sum\nolimits_{ij} { | |{\mathbf{y}}_{i} - {\mathbf{y}}_{j} | |^{ 2} {\mathbf{B}}_{ij}^{GDE} } = 2tr\left( {{\mathbf{A}}^{\text{T}} {\mathbf{XL}}^{{GDE^{\prime}}} {\mathbf{X}}^{\text{T}} {\mathbf{A}}} \right) $$

(10)

where $ {\mathbf{L}}^{GDE} = {\mathbf{D}}^{GDE} - {\mathbf{W}}^{GDE} $ and $ {\mathbf{L}}^{{GDE^{\prime}}} = {\mathbf{D}}^{{GDE^{\prime}}} - {\mathbf{B}}^{GDE} $ are the Laplacian matrices; $ {\mathbf{D}}_{ij}^{GDE} = \sum\nolimits_{j} {{\mathbf{W}}_{ij}^{GDE} } $ and $ {\mathbf{D}}_{ij}^{{GDE^{\prime}}} = \sum\nolimits_{j} {{\mathbf{B}}_{ij}^{GDE} } $ are the diagonal matrices. Therefore, the discriminant criterion function of GDE can be computed [5]:

$$ J_{GDE} ({\mathbf{A}}) = \mathop {\hbox{min} }\limits_{{\mathbf{A}}} \;\frac{{{\mathbf{S}}_{w}^{GDE} }}{{{\mathbf{S}}_{b}^{GDE} }} = \mathop {\hbox{min} }\limits_{{\mathbf{A}}} \;\frac{{tr({\mathbf{A}}^{T} {\mathbf{XL}}^{GDE} {\mathbf{X}}^{T} {\mathbf{A}})}}{{tr({\mathbf{A}}^{T} {\mathbf{XL}}^{{GDE^{\prime}}} {\mathbf{X}}^{T} {\mathbf{A}})}} $$

(11)

In GDE method, the similarity defined by monotonic functions can effectively solve the effect of the distant samples on recognition results. However, the similarity of samples is directly defined by the distance of samples, and outliers will have a certain impact on the discriminant performance. Furthermore, GDE computes the similarity for labeled samples without taking unlabeled samples into account.

3 Methodology

3.1 The Proposed Method

As described in Sect. 2.2, GDE does not study the influence of outliers and unlabeled samples on the recognition performance. In this paper, we propose a novel graph embedding discriminant analysis method where the similarity of samples is based on the similarities between them and the mean of intraclass samples. It intends to eliminate the impact of outliers on the within-class scatters and improves the recognition performance.

Let G_intrinsic = {V, E, W} denotes the redefined intrinsic graph, and G_penalty = {V, E, B} denotes the penalty graph, and the weight matrices W and B are given as follows:

$$ {\mathbf{W}}_{ij} = \left\{ {\begin{array}{*{20}l} {e^{{ - \frac{{||{\mathbf{x}}_{i} - {\tilde{\mathbf{x}}}_{i} ||^{2} + ||{\mathbf{x}}_{j} - {\tilde{\mathbf{x}}}_{i} ||^{2} }}{2t}}} ,} \hfill & {{\text{if }}\;l_{i} = l_{j} } \hfill \\ {0,} \hfill & {{\text{if}}\;\;l_{i} \ne l_{j} } \hfill \\ \end{array} } \right. $$

(12)

$$ {\mathbf{B}}_{ij}^{{}} = \left\{ {\begin{array}{*{20}l} {e^{{ - \frac{{||{\mathbf{x}}_{i} - {\mathbf{x}}_{j} ||^{2} }}{t}}} ,} \hfill & {{\text{if}}\;\;{\mathbf{x}}_{i} \in D_{k} ({\mathbf{x}}_{j} )\;\;{\text{or}}\;\;{\mathbf{x}}_{j} \in D_{k} ({\mathbf{x}}_{i} ),\,l_{i} \ne l_{j} } \hfill \\ {e^{{ - \frac{t}{{||{\mathbf{x}}_{i} - {\mathbf{x}}_{j} ||^{2} }}}} ,} \hfill & {{\text{if}}\;\;{\mathbf{x}}_{i} \notin D_{k} ({\mathbf{x}}_{j} )\;\;{\text{and}}\;\;{\mathbf{x}}_{j} \notin D_{k} ({\mathbf{x}}_{i} ),\,l_{i} \ne l_{j} } \hfill \\ {0,} \hfill & {{\text{if}}\;\;l_{i} = l_{j} } \hfill \\ \end{array} } \right. $$

(13)

where $ {\tilde{\mathbf{x}}}_{i} $ denotes the mean of within-class samples and D_k(x_i) denotes the k nearest interclass neighbors of x_i. However, unlike GDE, we redefine the weight matrix W based on the distance between the sample and the mean of within-class samples instead of directly using the distance of samples. Moreover, we also describe the importance of samples through monotonic functions in view of GDE.

According to the constructed two graphs, the intraclass scatter $ {\mathbf{S}}_{w}^{{}} $ and the interclass scatter $ {\mathbf{S}}_{b}^{{}} $ can be formulated:

$$ {\mathbf{S}}_{w} = \sum\limits_{ij} {||{\mathbf{y}}_{i} - {\mathbf{y}}_{j} ||^{2} {\mathbf{W}}_{ij}^{{}} } = \sum\limits_{ij} {||{\mathbf{A}}^{\text{T}} {\mathbf{x}}_{i} - {\mathbf{A}}^{\text{T}} {\mathbf{x}}_{j} ||^{2} {\mathbf{W}}_{ij}^{{}} } = 2tr({\mathbf{A}}^{\text{T}} {\mathbf{XLX}}^{\text{T}} {\mathbf{A}}) $$

(14)

$$ {\mathbf{S}}_{b} = \sum\limits_{ij} {||{\mathbf{y}}_{i} - {\mathbf{y}}_{j} ||^{2} {\mathbf{B}}_{ij} } = \sum\limits_{ij} {||{\mathbf{A}}^{\text{T}} {\mathbf{x}}_{i} - {\mathbf{A}}^{\text{T}} {\mathbf{x}}_{j} ||^{2} {\mathbf{B}}_{ij} } = 2tr({\mathbf{A}}^{\text{T}} {\mathbf{XL^{\prime}X}}^{\text{T}} {\mathbf{A}}) $$

(15)

where $ {\mathbf{L}} = {\mathbf{D}} - {\mathbf{W}} $ and $ {\mathbf{L^{\prime}}} = {\mathbf{D^{\prime}}} - {\mathbf{B}} $ are the Laplacian matrices. $ {\mathbf{D}}_{ij} = \sum\nolimits_{j} {{\mathbf{W}}_{ij} } $ and $ {\mathbf{D^{\prime}}}_{ij} = \sum\nolimits_{j} {{\mathbf{B}}_{ij} } $ are the diagonal matrices. Therefore, the proposed discriminant criterion function is given:

$$ J_{Ours} ({\mathbf{A}}) = \mathop {\hbox{min} }\limits_{{\mathbf{A}}} \;\frac{{tr({\mathbf{A}}^{\text{T}} {\mathbf{XLX}}^{\text{T}} {\mathbf{A}})}}{{tr({\mathbf{A}}^{\text{T}} {\mathbf{XL^{\prime}X}}^{\text{T}} {\mathbf{A}})}} $$

(16)

Consequently, our method can avoid the influence of outliers to the within-class scatter, making the intraclass samples close to each other and interclass samples far away from each other in the projected space.

3.2 Semi-Supervised Extension (SSE)

For unlabeled samples, SDA [6] builds the k-nearest neighbor graph on unlabeled samples to preserve local neighbor relationships. However, the neighbor parameter k in SDA is difficult to define. In the past few years, LRR [7] is proposed, and it uses low-rank and sparse matrix decomposition to express the global structure of the samples, so that the low-rank structure can reflect the intrinsic affinity of samples. Nevertheless, LRR is essentially unconstrained, and it may be affected by noise. In this paper, we use the low rank representation with L₁-norm sparsity constraint to capture their low-rank relationships, and minimize the error function through preserving the low-rank structure of samples. For the given training samples X, we refactor the original samples through the data dictionary D = [d₁, d₂, …, d_m] ∈ R^d×m. In other words, X = DZ, where Z = [z₁, z₂, …, z_n] ∈ R^m×n is the refactor coefficient matrix. Besides, there may exist noise in the distribution of samples, therefore the problem can be modeled as follows [8]:

$$ \left\{ {\begin{array}{*{20}l} {\mathop {\hbox{min} }\limits_{z} \left\| {\mathbf{Z}} \right\|_{*} + \lambda \left\| {\mathbf{E}} \right\|_{2,1} + \gamma \left\| {\mathbf{Z}} \right\|_{1} } \hfill \\ {s.t.\;\;{\mathbf{X}} = {\mathbf{DZ}} + {\mathbf{E}}} \hfill \\ \end{array} } \right. $$

(17)

where ||Z||_* denotes the kernel norm of Z, that is, the sum of the singular values of the matrix. $ \left\| {\mathbf{E}} \right\|_{2,1} = \sum\nolimits_{j = 1}^{n} {\sqrt {\sum\nolimits_{i = 1}^{d} {\left( {{\mathbf{E}}_{ij} } \right)^{2} } } } $ is the l_2,1-norm of the noise matrix E, which is used to model samples of noise interference. ||Z||₁ represents the L₁-norm of the matrix, which is used to ensure the sparsity of Z. the parameter λ is used to balance the effects of the noise part; and the parameter γ is to balance the effects of the sparse part on the results. By applying the L₁-norm sparsity constraint and adding the error term to solve noise pollution, we can obtain more robust low-rank relationships.

According to Eq. (16), we can see that x_i = Xz_i + e_i, and e_i is the column vector of the noise matrix E. It is assumed that the x_i remains the low-rank structure after linearly projection into the low-dimensional feature space, and the influence of noise on classification can be avoided by minimizing the representation error e_i. Therefore, we can minimize the norm of e_i as follows:

$$ \sum\nolimits_{i} {\left\| {{\mathbf{A}}^{\text{T}} {\mathbf{e}}_{i} } \right\|^{2} } = \sum\nolimits_{i} {\left\| {{\mathbf{A}}^{\text{T}} \left( {{\mathbf{x}}_{i} - {\mathbf{Xz}}_{i} } \right)} \right\|^{2} } = tr\left( {{\mathbf{A}}^{\text{T}} {\mathbf{S}}_{{\mathbf{e}}} {\mathbf{A}}} \right) $$

(18)

where S_e can be computed by:

$$ {\mathbf{S}}_{{\mathbf{e}}} = \sum\nolimits_{i} {\left( {{\mathbf{x}}_{i} - {\mathbf{Xz}}_{i} } \right)\left( {{\mathbf{x}}_{i} - {\mathbf{Xz}}_{i} } \right)^{\text{T}} } = {\mathbf{X}}\left( {{\mathbf{I}} - {\mathbf{Z}}} \right)\left( {{\mathbf{I}} - {\mathbf{Z}}} \right)^{\text{T}} {\mathbf{X}}^{\text{T}} $$

(19)

Finally, we integrate the improved graph embedding discriminant criterion and LRR error function for achieving semi-supervised learning. The discriminant criterion function of our extension method is given as following:

$$ J_{Ours - SSE} ({\mathbf{A}}) = \mathop {\hbox{min} }\limits_{{\mathbf{A}}} \;\frac{{tr({\mathbf{A}}^{\text{T}} {\mathbf{XLX}}^{\text{T}} {\mathbf{A}}) + \beta tr({\mathbf{A}}^{\text{T}} {\mathbf{S}}_{{\mathbf{e}}} {\mathbf{A}})}}{{tr({\mathbf{A}}^{\text{T}} {\mathbf{XL^{\prime}X}}^{\text{T}} {\mathbf{A}})}} $$

(20)

where the regularization parameter β (β ∈ (0, 1)) is used to balance the loss of the discriminant criterion function.

3.3 Algorithm

The dimensionality of face images is often very high, thereby we first use the PCA projection A_PCA for dimensionality reduction. For the dataset X = [X_L, X_U] in the PCA subspace, where X_L is the subset of labeled samples and X_U is the unlabeled samples. The proposed algorithm is given as follows:

Step1:	For X_L, construct the two adjacency matrices W and B based on Eqs. (12) and (13), and compute the intraclass scatter $ {\mathbf{S}}_{w}^{{}} $ and the interclass scatter $ {\mathbf{S}}_{b}^{{}} $ based on Eqs. (14) and (15).
Step2:	In consideration of unlabeled samples in datasets, compute the reconstruction coefficient matrix Z based on Eq. (17), then compute the matrix S_e based on Eq. (19). Let a₁, a₂, …, a_p be the eigenvectors associated with the p minimal eigenvalues of the general eigenvalue problem $ {\mathbf{XL^{\prime}X}}^{\text{T}} {\mathbf{A}} = \lambda ({\mathbf{XLX}}^{\text{T}} + \beta {\mathbf{S}}_{{\mathbf{e}}} ){\mathbf{A}} $, then the optimal projection matrix learned by our extension method can be denoted as A_Ours-SSE = [a₁, a₂, …, a_p], and the final projection is A = A_PCA $ \cdot $ A_Ours-SSE.
Step3:	For a test samples x, the representation in the low-dimensional space can be represented as y = A^Tx, and then we can use a classifier to predict its class.

4 Experiments

In this section, we verify the proposed method with multiple classifiers, i.e. NNC (nearest neighbor classifier), SVM (support vector machine), LRC (linear regression classifier) [9] and LLRC (locality-regularized linear regression classifier) [10], on three public face datasets (ORL, AR and FERET), and particularly evaluates its performance under noise and blur conditions. All the experiments are conducted on a computer with Intel Core i7-7700 3.6 GHz Windows 7, RAM-8 GB, and implemented by MATLAB.

4.1 Experiments on ORL Dataset

The ORL face dataset contains 400 images of 40 people. All images are in grayscale and manually cropped to 92 × 112 pixels. On ORL face dataset, we select 5 images of each person for training samples and the remaining images for testing. Furthermore, the first r (r = 2, 3) images of each person are labeled. The NNC, SVM, LRC and LLRC are picked for classification. Firstly, the dimension of PCA subspace is set to 50. In our method, the value of the interclass neighbor parameter k is 4, and the tuning parameter of adjacency matrices t is set to 2³. The neighbor parameter k in LPP is 4. For MFA and GDE, the intraclass neighbor parameter k₁ is picked as 1, and the interclass neighbor parameter k₂ is 40.

Table 1 reports the maximal recognition rates of different methods on ORL dataset. It can be seen that the recognition performance of our method is better than other methods in various classifiers, especially GDE. When unlabeled samples are existing in dataset, the L₁-norm sparse constraint is introduced, which makes our extension method is more effective in classifying these samples. Furthermore, the regularization parameter β which is applied for the loss of the discriminant criterion function is critical to the recognition performance, and too large value of β will make the recognition rate degraded. In addition, Fig. 1 shows the recognition rate curves of all methods under different classifiers as the projection axes varies. It can be easily found that the recognition performance of all method outperforms under LLRC, and our extension method achieves the best performance, that further proves the efficiency of our extension method.

Table 1. Maximal recognition rates (%) and corresponding dimensions (shown in the parentheses) of different methods under four classifiers on ORL dataset.

Full size table

4.2 Experiments on AR Dataset

The AR face dataset contains over 3000 face images of 120 people. All images are manually cropped to 50 × 40 pixels. On AR face dataset, we only use the images without occlusion, and the first 7 images of each person are taken for training and 14–20 images are applied for testing. Furthermore, the first r (r = 3, 4, 5) images of each person are labeled. The NNC, SVM, LRC and LLRC are employed for classification. Before the experiments, the dimension of PCA subspace is set to 150. The parameters in the experiments are set as follows: In our method, the value of the interclass neighbor parameter k is 6, and the tuning parameter of adjacency matrices t is chosen as 2⁴. The neighbor parameter k in LPP is set to 6. For MFA and GDE, the intraclass neighbor parameter k₁ is picked as 1, and the interclass parameter k₂ is set to 120.

Table 2 reports the maximal recognition rates of different methods on AR dataset. The results indicate that compared with GDE, our method has a small improvement (<1%) under four classifiers, but the recognition performance of our extension method has obvious advantages when unlabeled samples are existing in datasets. Meanwhile, the more labeled samples, the better performance our extension method achieves. In addition, the recognition rate curves of different methods under various classifiers as the projection axes varies are shown in Fig. 2. We can see that LPP performs the worst, but when the regularization parameter β is set to 0.3, the curves in Fig. 2 demonstrate that our extension method performs better than other methods, which proves that the L₁-norm sparse constraint we introduced can eliminate the influence of unlabeled samples on the experimental results.

Table 2. Maximal recognition rates (%) and corresponding dimensions of different methods under four classifiers on AR dataset.

Full size table

4.3 Experiments on FERET Dataset

The FERET face dataset contains over 1000 face images of 200 people. All images are in grayscale and manually cropped to 80 × 80 pixels. On FERET face dataset, the first 4 images of each person are employed for training and the remaining for testing. Similar to the experiment on ORL dataset, the first r (r = 2, 3) images are labeled. The NNC, SVM, LRC and LLRC are chosen for classification. First of all, the dimension of PCA subspace is set to 150. The parameters of this experiment are set as follows: In our method, the interclass neighbor parameter k is picked as 3, and the tuning parameter of adjacency matrices t is 2³. For LPP, the neighbor parameter k is set to 3, and for MFA and GDE, the intraclass neighbor parameter k₁ is set to 1 and the interclass neighbor parameter k₂ is 200.

Table 3 reports the maximal recognition rates of different methods on FERET dataset. We can know that the overall recognition rate of all methods in this experiment has decreased, but compared with other methods, our method still has certain advantages. Furthermore, Fig. 3 represents the recognition rate curves of all methods under four classifiers as the projection axes varies. The results show that compared with the other three classifiers, all methods on LLRC will achieve better results, and when the value of β is set to 0.3, in most cases the curve of our extension method reaches the best performance, which also validates the efficiency of our extension method.

Table 3. Maximal recognition rates (%) and corresponding dimensions of different methods under four classifiers on FERET dataset.

Full size table

4.4 Experiments Under Noise Condition

In view of the effect of noise, in this section, Gaussian noise (mean μ = 0, variance σ = 0.05), pepper & salt noise (noise density d = 0.1) and multiplicative noise (mean μ = 0, variance σ = 0.1) are applied to face samples on ORL, AR and FERET datasets. Face samples under three different noises on ORL, AR and FERET datasets are given in Fig. 4. The comparative results of the recognition rates of different methods on three datasets are shown in Tables 4, 5, and 6.

Table 4. Maximal recognition rates (%) of different methods on ORL dataset under different noise conditions (r = 3).

Full size table

Table 5. Maximal recognition rates (%) of different methods on AR dataset under different noise conditions (r = 5).

Full size table

Table 6. Maximal recognition rates (%) of different methods on FERET dataset under different noise conditions (r = 3).

Full size table

It can be seen from Tables 4, 5, and, 6 that under three noise conditions, our method performs better than LDA, LPP, MFA and GDE, due to the improvement in the definition of the similarities of intraclass samples. Especially, since the low rank representation of L₁-norm sparsity constraint is introduced, the recognition rate of our extension method has obvious advantages under three noise conditions, which further proves the anti-noise ability of our extension method. Furthermore, as the number of samples increases and the faces become more complex, our method still has a better and robust performance under three noise conditions.

4.5 Experiments Under Blur Condition

In consideration of the blurred images in dataset, in this section, motion blur (the motion angle is 10^o and the distance of motion is 5) and defocus blur (the radius is 5) are employed to the face images on ORL, AR and FERET datasets. Figure 5 shows face samples under different blur conditions on ORL, AR and FERET datasets. The comparative results of the recognition rate of all methods on three datasets are shown in Tables 7, 8, and 9.

Table 7. Maximal recognition rates (%) of different methods on ORL dataset under different blur conditions.

Full size table

Table 8. Maximal recognition rates (%) of different methods on AR dataset under different blur conditions.

Full size table

Table 9. Maximal recognition rates (%) of different methods on FERET dataset under different blur conditions.

Full size table

According to the results in Tables 7, 8, and 9, it can be found that our method has an advantage over other methods in recognition rate under two blur conditions. Moreover, our extension method uses sparse representation to optimize the similarity of intraclass samples, which makes it perform better than GDE. However, as the number of samples increases and the faces become more complex, the recognition performance will be degraded. It can be seen from Table 9 that when the defocused blurred images are existing, the maximal recognition rate of various methods is low, not exceeding 40%, but the performance of our extension method still has a little advantage, which indicates the anti-interference ability of our extension method.

5 Conclusion

In this paper, we propose a novel method to eliminate the influence of outliers in face recognition. It builds the intrinsic graph in the light of the intraclass samples and their with-class mean, and constructs the penalty graph from the perspective of GDE. In addition, a low-rank representation with sparse constraints is introduced to explore the global low rank relationship of unlabeled samples. The experimental results on three datasets under different noise and blur conditions prove the effectiveness and robust of the proposed method.

References

Turk, M.A., Pentland, A.P.: Face recognition using eigenfaces. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 586–591. IEEE Computer Society Press, Los Alamitos (1991)
Google Scholar
Belhumeur, P.N., Hepanha, J.P., Kriegman, D.J.: Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 711–720 (1997)
Article Google Scholar
He, X.F., Niyogi, P.: Locality preserving projections. In: Proceedings of the Neural Information Processing System Conference, pp. 100–115. MIT Press, Cambridge (2003)
Google Scholar
Yan, S., Xu, D., Zhang, B., Zhang, H.J., Yang, Q., Lin, S.: Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 29(40), 40–50 (2007)
Article Google Scholar
Huang, P., Li, T., Gao, G., Yang, G.: Feature extraction based on graph discriminant embedding and its applications to face recognition. Softw. Comput. 23(16), 7015–7028 (2018). https://doi.org/10.1007/s00500-018-3340-5
Article Google Scholar
Cai, D., He, X., Han, J.: Semi-supervised discriminant analysis. In: 11th IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, pp. 1–7. IEEE Press (2007)
Google Scholar
Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., Ma, Y.: Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 171–184 (2013)
Article Google Scholar
Hai-Hong, G., Jie, M., Jian-Long, L.: Feature extraction based on low rank description. Sci. Technol. Eng. 16(10), 200–204 (2016)
Google Scholar
Naseem, I., Togneri, R., Bennamoun, M.: Linear regression for face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 32(11), 2106–2112 (2010)
Article Google Scholar
Brown, D., Li, H.X., Gao, Y.S.: Locality-regularized linear regression for face recognition. In: Proceedings of the 21st International Conference on Pattern Recognition, Tsukuba, pp. 1586–1589 (2012)
Google Scholar

Download references

Acknowledgments

This study was partially supported by the National Natural Science Foundation of China (61906058, 61972128, 61702155), Natural Science Foundation of Anhui Province (1908085MF210, 1808085MF176) and the Fundamental Research Funds for the Central Universities of China (JZ2020YYPY0093).

Author information

Authors and Affiliations

School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, 230009, China
Jiangshan Du & Liping Zheng
School of Software, Hefei University of Technology, Hefei, 230009, China
Jun Shi

Authors

Jiangshan Du
View author publications
You can also search for this author in PubMed Google Scholar
Liping Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Jun Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Shi .

Editor information

Editors and Affiliations

Beijing Institute of Technology, Beijing, China
Yongtian Wang
Beijing University of Posts and Telecommunications, Beijing, China
Xueming Li
Peking University, Beijing, China
Yuxin Peng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Du, J., Zheng, L., Shi, J. (2020). Graph Embedding Discriminant Analysis and Semi-Supervised Extension for Face Recognition. In: Wang, Y., Li, X., Peng, Y. (eds) Image and Graphics Technologies and Applications. IGTA 2020. Communications in Computer and Information Science, vol 1314. Springer, Singapore. https://doi.org/10.1007/978-981-33-6033-4_5

Download citation

DOI: https://doi.org/10.1007/978-981-33-6033-4_5
Published: 22 December 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-6032-7
Online ISBN: 978-981-33-6033-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Graph Embedding Discriminant Analysis and Semi-Supervised Extension for Face Recognition

Abstract

Similar content being viewed by others

Learning Robust Weighted Group Sparse Graph for Discriminant Visual Analysis

Feature extraction based on graph discriminant embedding and its applications to face recognition

Adaptive graph orthogonal discriminant embedding: an improved graph embedding method

Keywords

1 Introduction