Abstract
In this paper, we proposed a linear discriminant approach, namely global–local Fisher discriminant analysis (GLFDA) that explicitly considers both the local and global discriminant structures embedded in data. To be specific, GLFDA constructs two graphs to, respectively, model the global and local discriminant structures and then incorporates discriminant structures and local intrinsic structure, which characterizes the within-class compactness, into the objective function for dimensionality reduction. Thus, GLFDA well encodes the discriminant information, especially the local discriminant information of data. Experimental results on AR, YALE, and UMIST databases show the effectiveness of the proposed algorithm.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Many previous works have demonstrated that manifold learning can effectively improve the image recognition accuracy [1–6]. Manifold learning technique constructs an adjacency graph over the training data to model the geometrical relationship of data and maps nearby data points in the high-dimensional data space to nearby data in the reduced space. It intuitively preserves the intrinsic structure of data and has been widely used in image recognition, clustering, image retrieval, and data classification. Locality Preserving Projection (LPP) [1, 7] is one of the prevalent approaches and has proven its effectiveness in face recognition, document clustering, and image retrieval. However, LPP neglects the label of samples when learning the intrinsic structure of data. This may impair the recognition accuracy.
Motivated by LPP and linear discriminant analysis (LDA) [8], many discriminant approaches based on manifold learning have been developed for linear dimensionality reduction [9–14], among which the representative approaches are Marginal Fisher Analysis (MFA) [2], Local Discriminant Embedding (LDE) [9], Local Fisher Discriminant Analysis (LFDA) [11, 12]. MFA, LFDA, and LDE represent the intraclass compactness by LPP [7] which maps nearby data points in the observed data space to nearby data with the low-dimensional representation. Different from MFA and LDE, which represent the between-class separability by maximizing the sum of distances among nearby data with different labels and characterize the local discriminant structure of data, LFDA represents the interclass separability by maximizing the sum of distances of data from different classes and detects the global discriminant structure of data. Moreover, the data pairs with large distance will dominate the global discriminant structure of data. This may impair the local discriminant structure.
Although the motivations of the above-mentioned discriminant approaches are different, they can be unified within the Graph Embedding Framework (GEF) that was proposed by Yan et al. [2]. GEF constructs two adjacency graphs, namely intrinsic graph and penalty graph, to model the intrinsic geometric structure and discriminant geometric structure of the data, respectively. However, GEF only considers the local or global discriminant structure of data and does not simultaneously consider them for dimensionality reduction. In real cases, the intrinsic structure of data is unknown and complex, and testing data are usually different from the training data due to many factors. Thus, only local or global discriminant structure of data is not sufficient to guarantee the recognition accuracy [3, 15–17]. It indicates that the recognition accuracy of above-mentioned discriminant approaches is not good enough on testing data.
In this paper, motivated by [17], we propose a novel discriminant approach, namely global–local Fisher discriminate analysis (GLFDA), for dimensionality reduction. To be specific, GLFDA constructs two adjacency graphs to model the local intrinsic geometric structure and discriminant geometric structure of data, respectively, and then integrates the global discriminant structure that is learned by LDA, local discriminant structure and intrinsic structure into the objective function of dimensionality reduction. Experimental results on AR, YALE, and UMIST databases demonstrate the effectiveness of the proposed approach.
The rest of the paper is organized as follows. Section 2 introduces our GLFDA approach. Section 3 provides some experimental results. We conclude the paper in Sect. 4.
2 Global–local fisher discriminant analysis
2.1 Motivation
According to above-mentioned analysis, graph embedding framework only encodes the local or global discriminating information, which will reduce the flexible of the algorithm and impair the recognition accuracy. In this section, we learn a novel dimensionality reduction approach, namely GLFDA, which explicitly considers both the global and local discriminating information of data. To be specific, we construct three adjacency graphs, namely intrinsic graph, global penalty graph, and local penalty graph, to model the intrinsic and discriminant geometrical structures of the data manifold. The global penalty graph encodes the global discriminating information of data and maps the large distance pairs from different classes to be faraway in the reduced space. The local penalty graph encodes the local discriminating information and maps nearby points from different classes to be faraway in the reduced space. Thus, the discriminant structure of data can be effectively characterized by local and global penalty graph. The intrinsic graph maps nearby data points from the same class to nearby points with the low-dimensional representation. In the following section, we first describe the intrinsic graph of data.
2.2 Intrinsic graph
Suppose, we have a set of p-dimensional samples X = {x 1, x 2, ···, x n }, belonging to C classes. We construct an adjacency graph \(G_{w} = \left\{ {X,W} \right\}\), namely intrinsic graph, with a vertex set X and weight matrix W to learn the similarity of data from the same class, which characterizes the intrinsic structure. The elements W ij in W are defined as follows:
where the elements A ij are 1 if x j is the k 1 -nearest neighbor of x i or vice versa; otherwise A ij = 0, n c denotes the number of samples in class c, τ i, and τ j denote the class label of data x i and x j , respectively.
Intrinsic graph aims to map nearby data points from the same class in the observed space to nearby data in the reduced space. As it happens, a reasonable criterion for choosing a good map is to optimize the following objective function
where y i = α T x i denotes the one-dimensional representation of x i , α is the transformation vector.
The objective function (2) with our choice of symmetric weight matrix W ij incurs a heavy penalty, if nearby points x i and x j , which are close or very close to each other and share the same label, are mapped far apart. Therefore, minimizing Eq. (2) is an attempt to ensure that if x i and x j are close, then y i and y j are also close. Thus, the intrinsic structure of the data manifold, which characterizes the similarity of data, is well preserved by Eq. (2).
2.3 Local penalty graph
Likewise, we construct an adjacency graph G B = {X, B}, namely local penalty graph, with a vertex set X and weight matrix B to encode the discriminating information embedded in nearby data from different classes. The elements B ij in B can be defined as follows:
where \(N_{{k_{2} }} \left( {x_{i} } \right)\) denotes the set of k 2 nearest neighbors of x i .
Local penalty aims to find a map so that the connected data in the adjacency graph G B stay as distant as possible. Suppose y i is a map of training data x i (i = 1, 2, ···, n), a reasonable map is to optimize the following objective function:
The objective function (4) on the graph G B incurs a heavy penalty if nearby two points x i and x j are mapped close together while they actually belong to different classes. So, maximizing Eq. (4) is an attempt to ensure that if x i and x j are close but have different labels, then y i and y j are also far apart. Thus, we can well detect the local discriminant geometrical structure of data by Eq. (4) which encodes the local discriminating information.
Obviously, the objective function (4) does not explicitly consider the data points which are not in the local neighborhood and have different labels. In other words, these data points may be mapped to be close to each other in the reduced space. This impairs the global discriminant structure of data, resulting in not better recognition accuracy for the algorithm. In Sect. 2.4, we construct a global graph, namely global penalty graph, to address this problem.
2.4 Global penalty graph
Motivated by Fisher discriminant analysis, we construct a global graph, namely global penalty graph, G g = {X, F} with a vertex set X and weight matrix F to model the global discriminant structure that encodes the global discriminating information of data. The elements F ij in F can be defined as follows:
The global penalty graph seeks to find a map so that the data points, which have different labels, can be mapped to be faraway in the reduced space. Suppose y i is a map of training data x i (i = 1, 2, ···, n), a reasonable map is to optimize the following objective function:
The objective function (6) on the graph G g incurs a heavy penalty if two points x i and x j are mapped close together while they actually belong to different classes. So, maximizing Eq. (6) is an attempt to ensure that if x i and x j have different labels, then y i and y j are far apart. Thus, we can well detect the global discriminant geometrical structure of data by Eq. (6).
Moreover, the objective function (6) emphasizes the large distance pairs and guarantees that the larger the distance between two data points is, the farther they are embedded in the reduced space. It means that Eq. (6) maps data points, which are in the un-local neighborhoods and have different labels, to be still distant in the reduced space. This effectively avoids the disadvantage caused by Eq. (4).
2.5 Objective function
In this section, we describe the GLFDA approach that solves the objective functions (2), (4), and (6). Suppose α is a projection vector, substituting y i = α T x i into the objective function (2), and following some simple algebraic steps, we can see that
where L w = D w − W, D w is a diagonal matrix whose entries are column (or row, since W is symmetric) sum of W, i.e., D w (i, i) = ∑ j W ij .
Substituting y i = α T x i into the objective function (4), we see that
where L b = D b − B, D b is a diagonal matrix whose entries are column (or row, since B is symmetric) sum of B, i.e., D b (i, i) = ∑ j B ij .
Likewise, substituting y i = α T x i into the objective function (6), we see that
where L f = D f − F, D f is a diagonal matrix whose entries are column (or row, since F is symmetric) sum of F, i.e., D f (i, i) = ∑ j F ij .
Substituting Eqs. (7), (8), and (9) into Eqs. (2), (4), and (6), respectively, we have several ways to build the objective function of GLFDA by integrating Eqs. (2), (4), and (6). The representative one is the Rayleigh quotients form that is formally similar to LDA [8]. Thus, we can write the objective function of GLFDA as follows:
where t ∊ [0, 1] is a parameter and balances the global and local discriminant information of data. S b = XL b X T is the global between-class scatter matrix, S f = XL f X T is the local between-class scatter matrix, and S w = XL w X T is the local within-class scatter matrix.
The optimal projection vector α that maximizes Eq. (10) is given by the maximum eigenvalue solution to the generalized eigenvalue problem:
Let the column vector α 1, α 2, ···, α d be the solution of (11), ordered according to its eigenvalues, λ 1 ≥ λ 2 ≥ ··· ≥ λ d > 0, then the projection matrix P can be written as P = [α 1, α 2, ···, α d ].
Note that, in face recognition, S w is singular, and the optimal projection matrix is not directly calculated from Eq. (11). In this case, many approaches have been developed to solve the optimal projection vector α. In the experiments, we simply choose the same way as in Fisherface [8] for its simplicity, i.e., PCA is first used to reduce dimension such that XL w X is nonsingular, then solving the Eq. (11) in the PCA space.
2.6 Algorithm
Given training data set X = [x 1 x 2··· x n ] including C classes. The algorithmic procedure of GLFDA is summarized as follows.
2.6.1 Construct intrinsic and penalty graphs
Suppose that G w = {X, W}, \(G_{\text{B}} = \left\{ {X,B} \right\}\), and \(G_{\text{g}} = \left\{ {X,F} \right\}\) denote local intrinsic graph, local penalty graph, and global penalty graph, respectively. The weighted matrices W, B, and F are calculated by Eqs. (1), (3), and (5), respectively.
2.6.2 Calculate scatter matrices
In this step, we calculate scatter matrices S w , S b , and S f by Eqs. (7), (8), and (9), respectively.
2.6.3 PCA projection
In this step, we project the data points x i (i = 1, ···, n) into the PCA subspace. Suppose that the projection matrix of PCA is W PCA ∊ R p×l, whose columns consist of the l leading eigenvectors of the covariance matrix of data; the scatter matrixes S w , S b , and S f become \(\tilde{S}_{w} = W^{T}_{\text{PCA}} XL_{w} X^{T} W_{\text{PCA}}\), \(\tilde{S}_{b} = W^{T}_{\text{PCA}} XL_{b} X^{T} W_{\text{PCA}}\), and \(\tilde{S}_{f} = W^{T}_{\text{PCA}} XL_{f} X^{T} W_{\text{PCA}}\) , respectively, after PCA projection.
2.7 Compute the projection matrix
Solve the following generalized eigenvector problem:
Let α 1, α 2, ···, α d be the solutions of the above equation, ordered according to their eigenvalues, λ 1 ≥ λ 2 ≥ ··· ≥ λ d . Thus, the projection matrix of GLFDA is P S = [α 1 α 2⋯ α d ].
2.7.1 Linear embedding
In this step, we compute the linear projections. Given an arbitrary training vector x i , the linear embedding is as follows:
3 Experiments
In this section, we evaluate the proposed approach on three well-known face databases (YALE, UMIST, and AR), and compare its performance with Fisherface [8], MFA [2], SDA [17], DLA [18], and LFDA [11]. In the experiments, we use Euclidean metric and nearest classifier for classification due to the simplicity. In real cases, it is very difficult to select a suitable neighbor parameter k for MFA, LFDA, DLA, and GLFDA approaches due to the complex distribution of training data. Note that the same problem is often encountered in many manifold learning approaches. In experiments, we empirically determined a proper neighbor parameter k within the interval [1, n − 1] for all approaches, where n denotes the number of training samples. The parameter t depends on the distribution of data. In our approaches, we determined t as 0.5, 0.1, and 0.4 for YALE, UMIST, and AR databases, respectively.
The Yale face database [19] contains 165 images of 15 individuals (each person providing 11 different images) under various facial expressions and lighting conditions. In our experiments, each image is manually cropped and resized to 32 × 32 pixels. Figure 1 shows some images of one person. In the experiments, the first 3, 6, and 9 images per person are selected for training, respectively, and the corresponding rest images for testing. Thus, we have three training sets that include 45, 90, and 135 images, and the corresponding three testing sets including 120, 75, and 30 images. The top recognition accuracies of the six approaches are shown in Table 1. Figure 2 plots the curve of recognition accuracy versus number of projection vectors when the first 6 images per person are selected for training.
The UMIST database [20] is a multi-view database, consisting of 564 images from 20 individuals under various poses. Each image is of the 112 × 92 size. In the experiments, the first 19 images per image are selected as gallery. Figure 3 shows some images of one subject. In experiments, the first 3, 6, and 9 images per person are, respectively, selected as training images and the remaining images for the corresponding testing images. Thus, we have three training sets that include 60, 120, and 180 images, and the corresponding three testing sets including 320, 260, and 200 images, respectively. The top recognition accuracies of the six approaches are shown in Table 2. Figure 4 plots the curve of the recognition accuracy versus number of projection vectors of six approaches when the first 3 images per person are selected for training.
The AR face database [21] is established by American Purdue University, which contains over 4,000 color face images of 126 people (70 men and 56 women) including frontal views of faces with different facial expressions, lighting conditions, and occlusions. The pictures of most persons are taken in two sessions (separated by 2 weeks). Each session contains 13 color images and 120 individuals (65 men and 55 women) participate in both sessions. In the experiments, the facial portion of each image is manually cropped and then normalized to the size of 50 × 40. The images from the first session with (a) “neutral expression”, (b) “smile”, (c) “anger”, (d) “scream”, (e) “left light on”, (f) “right light on”, and (g) “both side light on” are selected for training, and the corresponding second session images, i.e., from (n)–(t), are selected for testing. Thus, training set and testing set both include 840 images. Figure 5 shows some sample images of one subject. Table 3 lists the top recognition accuracies of the six approaches. Figure 6 plots the curve of the recognition accuracy versus number of projection vectors of the six approaches.
From Tables 1, 2 and 3, Figs. 2, 4, and 6, we can see that:
-
1.
Our GLFDA approach is superior to Fisherface. This is probably because that Fisherface does not well encode the discriminative information embedded in nearby data from different classes and also does not preserve the local intrinsic geometrical structure embedded in nearby data from the same class. This impairs the recognition accuracy of the algorithm.
-
2.
The top recognition accuracy of GLFDA is also superior to MFA and DLA. This is probably because that MFA and DLA only encode the discriminative information embedded in the nearby data points from different classes and do not explicitly consider the global discriminative information embedded in data points having different labels. This will impair the recognition accuracy.
-
3.
Global–local Fisher discriminant analysis is also superior to LFDA. This is probably due to the fact that LFDA only well detects the global discriminant structure, which maps data points from different classes to be faraway in the reduced space by the quadratic function. However, this quadratic function emphasizes the large distance pairs and may impair the local discriminant information of data. This leads to inseparability among nearby data from different classes.
-
4.
SDA is inferior to GLFDA. Although SDA considers both the global and local geometrical structures of data, it ignores the local discriminant geometrical structure. This may impair the local discriminating information of data and lead to inseparability among nearby data having different labels. In Fig. 4, the recognition accuracy of GLFDA is not obvious better than SDA. This is probably because that GLFDA ignores the un-local intrinsic structure of data.
-
5.
Global–local Fisher discriminant analysis is superior to the other discriminant approaches. The reason is probably because that the other discriminant approaches only encode the global or local discriminant information of data. In real applications, the intrinsic structure of data is unknown and complex, only single structure is not suitable to guarantee the discrimination ability in the reduced space. Different from these approaches, GLFDA explicitly considers both the global and local discriminant information of data. Thus, GLFDA can well separate the data having different class labels in the reduced space. We randomly generate some two-dimensional data points using MATLAB and show the projection directions of GLFDA, Fisherface, and MFA in Fig. 7a. We show the corresponding one-dimensional embedded results in Fig. 7b–d, respectively. It is easy to see that GLFDA well encodes the more discriminant information embedded in data than Fisherface and MFA.
4 Conclusion
In this paper, we propose a novel dimensionality reduction approach, namely GLFDA. GLFDA explicitly considers the local intrinsic structure, global discriminant structure, and local discriminant structure of the data manifold by constructing three adjacency graphs and then incorporates them into one objective function for dimension reduction. In this way, GLFDA well encodes the discriminative information. Experiments on the YALE, UMIST, and AR databases show the effectiveness of the proposed GLFDA approach.
References
He X, Yan S, Hu Y, Niyogi P, Zhang HJ (2005) Face recognition using Laplacianfaces. IEEE Trans Pattern Anal Mach Intell 27(3):328–340
Yan S, Xu D, Zhang B, Zhang H, Yang Q, Lin S (2007) Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):40–51
Gao Q, Xu H, Li Y, Xie D (2010) Two-dimensional supervised local similarity and diversity projection. Pattern Recognit 43(10):3359–3363
Wang J, Yang W, Yang J (2013) Face recognition using fuzzy maximum scatter discriminant analysis. Neural Comput Appl 23:957–964
Xu Y, Feng G, Zhao Y (2009) One improvement to two-dimensional locality preserving projection method for use with face recognition. Neurocomputing 73(1–3):245–249
Xu Y, Zhong A, Yang J, Zhang D (2010) LPP solution schemes for use with face recognition. Pattern Recognit 43(12):4165–4176
He X, Niyogi P (2003) Locality preserving projections. In: Proceedings of NIPS, vol 16, pp 153–160
Belhumeur PN, Hepanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
Chen HT, Chang HW, Liu TL (2005) Local discriminant embedding and its variants. IEEE Conf Comput Visual Pattern Recognit (CVPR) 2:846–853
Fan Z, Xu Y, Zhang D (2011) Local linear discriminant analysis framework using sample neighbors. IEEE Trans Neural Netw 22(7):1119–1132
Sugiyama M (2006) Local fisher discriminant analysis for supervised dimensionality reduction. In: Proceedings of international conference on machine learning (ICML), vol 23, pp 905–912
Sugiyama M, Roweis S (2007) Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. J Mach Learn Res 8:1027–1061
Cai D, He X, Zhou K, Han J, Bao H (2007) Locality sensitive discriminant analysis. In: Proceedings of joint conference on artificial intelligence (IJCAI), vol 20, pp 708–713
Lin F, Jiao LC, Shang F, Wang S, Hou B (2013) Fast Fisher sparsity preserving projections. Neural Comput Appl 23:691–705
Gao Q, Liu J, Zhang H, Hou J, Yang X (2012) Enhanced fisher discriminant criterion for image recognition. Pattern Recognit 45:3717–3724
Chen J, Ye J, Li Q (2007) Integrating global and local structures: a least squares framework for dimensionality reduction. IEEE Conf Comput Visual Pattern Recognit (CVPR) 1:1–8
Cai D, He X, Han J (2007) Semi-supervised discriminant analysis. IEEE Conf Comput Vision (ICCV) 11:1–8
Zhang T, Tao D, Li X, Yang J (2009) Patch alignment for dimensionality reduction. IEEE Trans Knowl Data Eng 21(9):1299–1313
The Yale Face Database, http://cvc.yale.edu/projects/yalefaces/yalefaces.html
The UMIST database, http://images.ee.umist.ac.uk/danny/database.html
The AR face data base, http://rvll.ecn.purdue.edu/~aleix/aleix_face_DB.html
Acknowledgments
We would like to thank the anonymous reviewers and AE for their constructive comments and suggestions. This work is supported by the National Natural Science Foundation of China under Grant 61271296, the Natural Science Basic Research Plan in Shaanxi Province of China under Grant 2012JM8002, China Postdoctoral Science Foundation under Grant 2012M521747, and the 111 Project of China (B08038).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, Q., Hu, X., Gao, Q. et al. Global–local fisher discriminant approach for face recognition. Neural Comput & Applic 25, 1137–1144 (2014). https://doi.org/10.1007/s00521-014-1592-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-014-1592-2