From NLDA to LDA/GSVD: a modified NLDA algorithm

Yin, Jun; Jin, Zhong

doi:10.1007/s00521-011-0728-x

From NLDA to LDA/GSVD: a modified NLDA algorithm

Original Article
Published: 19 August 2011

Volume 21, pages 1575–1583, (2012)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Computing and Applications Aims and scope Submit manuscript

From NLDA to LDA/GSVD: a modified NLDA algorithm

Download PDF

Jun Yin¹ &
Zhong Jin¹

277 Accesses
2 Citations
Explore all metrics

Abstract

Linear discriminant analysis (LDA) often encounters small sample size (SSS) problem for high-dimensional data. Null space linear discriminant analysis (NLDA) and linear discriminant analysis based on generalized singular value decomposition (LDA/GSVD) are two popular methods that can solve SSS problem of LDA. In this paper, we present the relation between NLDA and LDA/GSVD under a condition and at the same time propose a modified NLDA (MNLDA) algorithm which has the same discriminating power as LDA/GSVD and is more efficient. In addition, we compare the discriminating capability of NLDA and MNLDA and present our interpretation about this. Experimental results on ORL, FERET, Yale face databases, and the PolyU FKP database support our viewpoints.

Two-dimensional Bhattacharyya bound linear discriminant analysis with its applications

Article 05 November 2021

A New Linear Discriminant Analysis Method to Address the Over-Reducing Problem

Nonparametric Discriminant Multi-manifold Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In pattern recognition, the data often lie in a high-dimensional space. In order to analyze the data, we need to reduce the dimensionality and find the most important features. Feature extraction is an essential technique to deal with this problem, and it has attracted much attention in computer vision and pattern recognition because of its great importance. In the past decades, numerous feature extraction methods have been proposed [1–12]. Linear discriminant analysis (LDA) [13] is one of the most popular feature extraction and dimensionality reduction methods, which has been widely applied in high-dimensional pattern recognition problems [14–16]. It seeks to find the optimal projection vector by maximizing between-class scatter and minimizing within-class scatter simultaneously. The criterion of LDA is maximizing

$$ J_{F} (W) = \frac{{J_{b} (W)}}{{J_{w} (W)}} = \frac{{{\text{trace}}(W^{T} S_{b} W)}}{{{\text{trace}}(W^{T} S_{w} W)}}, $$

(1)

where S _b is the between-class scatter matrix, S _w is the within-class scatter matrix, and W is the projection matrix composed by optimal projection vectors. Suppose that we have M samples of c classes, S _b, S _w and the global scatter matrix S _t can be defined by

$$ S_{b} = \frac{1}{M}\sum\limits_{i = 1}^{c} {l_{i} (m_{i} - m)} (m_{i} - m)^{T} , $$

(2)

$$ S_{w} = \frac{1}{M}\sum\limits_{i = 1}^{c} {\sum\limits_{j = 1}^{{l_{i} }} {(x_{ij} - m_{i} )(x_{ij} - m_{i} )^{T} } } $$

(3)

and

$$ S_{t} = \frac{1}{M}\sum\limits_{i = 1}^{c} {\sum\limits_{j = 1}^{{l_{i} }} {(x_{ij} - m)(x_{ij} - m)^{T} } } , $$

(4)

where x _ij is the training sample j in class i, m is the global centroid, m _i (i = 1, 2, …, c) is centroid of class i, and l _i (i = 1, 2, …, c) is the number of training samples in class i.

From formula (3), it can be easily known that the rank of S _w is not larger than the maximum of the dimensionality of samples and the number of training samples. If the dimensionality of samples is too large, S _w will be singular. At this time, classical LDA cannot be performed successfully. This is so-called small sample size (SSS) problem, namely the number of training samples is relatively small compared to the dimensionality of samples. In order to solve this problem, several approaches have been proposed in the past few years. Among them, regularized LDA (RLDA) [17], PCA (principal component analysis) plus LDA (FDA) [14], null space LDA (NLDA) [18, 19], direct LDA (DLDA) [20], complete LDA (CLDA) [21, 22], inverse Fisher discriminant analysis (IFDA) [23], and LDA based on generalized singular value decomposition (LDA/GSVD) [24, 25] are the most well-known methods. RLDA makes S _w nonsingular by adding a diagonal matrix αI (α > 0) to S _w. In FDA, PCA [26] is performed first for dimensionality reduction to make S _w nonsingular before performing LDA. NLDA assumes that the null space of S _w contains the most useful discriminative information and seeks projection vectors in the null space of S _w. DLDA searches discriminative information in the range space of S _b and solves the Fisher criterion (1) by diagonalizing S _b before diagonalizing S _w. CLDA extracts features separately from the null space of S _w and the range space of S _w. In IFDA, a restriction is added in PCA procedure and introduces inverse Fisher criterion. LDA/GSVD uses generalized singular value decomposition (GSVD) to solve singular problem in classical LDA.

We analyze NLDA with LDA/GSVD in theory and show their relationship under a condition which always holds for high-dimensional data. It can be found from the analysis that there is only a little difference between NLDA and LDA/GSVD. If we make a modification for NLDA, it will become equivalent to LDA/GSVD. As we know, the disadvantage of LDA/GSVD is its high computational time and NLDA is much more efficient. Therefore, we propose a modified NLDA (MNLDA) algorithm in this paper. If the given condition is satisfied, MNLDA has the same discriminating power as LDA/GSVD but is more efficient. In addition, we compare the discriminating power of NLDA and MNLDA. Sometimes NLDA outperforms MNLDA and MNLDA performs better sometimes. This mainly depends on the characteristic of the data. We give our interpretation about this according to the experiments.

The rest of the paper is organized as follows: Sect. 2 describes NLDA and LDA/GSVD. We analyze the relation between NLDA and LDA/GSVD in Sect. 3. Section 4 proposes MNLDA algorithm and presents the comparison of MNLDA, LDA/GSVD, and NLDA. Section 5 describes experiments on ORL, FERET, Yale face databases, and the Hong Kong polytechnic university finger-knuckle-print (PolyU FKP) database. Our conclusions are provided in Sect. 6 at last.

2 NLDA and LDA/GSVD

2.1 NLDA

Chen et al. [18] proved that the null space of S _w contains the most discriminative information and proposed NLDA algorithm which searches projection vectors in the null space of S _w. The criterion of NLDA is defined by

$$ W = \arg \max_{{W^{T} S_{w} W = 0}} {\text{trace}}(W^{T} S_{b} W). $$

(5)

In order to improve the efficiency of the algorithm, the authors used a pixel grouping method to reduce the dimensionality of samples before performing NLDA in [18]. Based on the observation that the null space of S _t does not contain discriminative information, Huang et al. [27] removed the null space of S _t first and then perform NLDA.

NLDA algorithm is summarized as follows:

1.
Calculate S _t by (4). Work out S _t’s Eigen vectors p ₁, p ₂, …, p _d corresponding to nonzero Eigen values.
2.
Calculate S _b and S _w by (2) (3). Let P = (p ₁, p ₂, …, p _d). We get the new scatter matrices $ \tilde{S}_{b} = P^{T} S_{b} P $ and $ \tilde{S}_{w} = P^{T} S_{w} P $ after the removal of the null space of S _t.
3.
Work out $ \tilde{S}_{w} $’s Eigen vectors q ₁, q ₂, …, q _s corresponding to zero Eigen values and project the data into the null space of $ \tilde{S}_{w} $. Now the between-class scatter matrix becomes $ \hat{S}_{b} = Q^{T} \tilde{S}_{b} Q $, where Q = (q ₁, q ₂, …, q _s).
4.
Work out $ \hat{S}_{b} $’s Eigen vectors r ₁, r ₂, …, r _f corresponding to nonzero Eigen values. Let R = (r ₁, r ₂, …, r _f). We have the projection matrix of NLDA G _n = PQR.
5.
For a sample x, the new NLDA feature is y = G ^T_n x.

2.2 LDA/GSVD

Howland et al. [24] applied GSVD due to Paige et al. [28] to solve the SSS problem of LDA. For the n-dimensional data, we define

$$ H_{b} = \left[ {\frac{{\sqrt {l_{1} } }}{\sqrt M }(m_{1} - m),\frac{{\sqrt {l_{2} } }}{\sqrt M }(m_{2} - m), \,\ldots ,\frac{{\sqrt {l_{c} } }}{\sqrt M }(m_{c} - m)} \right] $$

(6)

and

$$ H_{w} = \frac{1}{\sqrt M }[X_{1} - m_{1} e_{1} ,X_{2} - m_{2} e_{2} , \,\ldots,X_{c} - m_{c} e_{c} ], $$

(7)

where $ X_{i} = [x_{i1} ,x_{i2} , \,\ldots,x_{{il_{i} }} ] \in R^{{n \times l_{i} }} $ is the training sample matrix of class i and $ e_{i} = [1,1, \,\ldots,1] \in R^{{1 \times l_{i} }} $. Then we have $ S_{b} = H_{b} H_{b}^{T} $ and $ S_{w} = H_{w} H_{w}^{T} $. Through applying GSVD to $ H_{b}^{T} $ and $ H_{w}^{T} $, we obtain

$$ U_{b}^{T} H_{b}^{T} V = \left[\overbrace {{\Upgamma_{b} }}^{d},\overbrace {0}^{n - d}\right] $$

(8)

and

$$ U_{w}^{T} H_{w}^{T} V = \left[\overbrace {{\Upgamma_{w} }}^{d},\overbrace {0}^{n - d}\right], $$

(9)

where U _b and U _w are orthogonal, V is nonsingular, $ \Upgamma_{b}^{T} \Upgamma_{b} $ and $ \Upgamma_{w}^{T} \Upgamma_{w} $ are diagonal matrices which satisfy $ \Upgamma_{b}^{T} \Upgamma_{b} $ + $ \Upgamma_{w}^{T} \Upgamma_{w} $ = I and d is the rank of $ \left( {\begin{array}{*{20}c} {H_{b}^{T} } \\ {H_{w}^{T} } \\ \end{array} } \right) $. It follows from (8) (9) that

$$ (U_{b}^{T} H_{b}^{T} V)^{T} U_{b}^{T} H_{b}^{T} V = V^{T} H_{b} H_{b}^{T} V = V^{T} S_{b} V = [\Upgamma_{b} ,0]^{T} [\Upgamma_{b} ,0] = \left( {\begin{array}{*{20}c} {\Upgamma_{b}^{T} \Upgamma_{b} } & 0 \\ 0 & 0 \\ \end{array} } \right) $$

(10)

and

$$ (U_{w}^{T} H_{w}^{T} V)^{T} U_{w}^{T} H_{w}^{T} V = V^{T} H_{w} H_{w}^{T} V = V^{T} S_{w} V = [\Upgamma_{w} ,0]^{T} [\Upgamma_{w} ,0] = \left( {\begin{array}{*{20}c} {\Upgamma_{w}^{T} \Upgamma_{w} } & 0 \\ 0 & 0 \\ \end{array} } \right). $$

(11)

Because $ \Upgamma_{b}^{T} \Upgamma_{b} $ and $ \Upgamma_{w}^{T} \Upgamma_{w} $ are diagonal and $ \Upgamma_{b}^{T} \Upgamma_{b} $ + $ \Upgamma_{w}^{T} \Upgamma_{w} $ = I, we have

$$ V^{T} S_{b} V = \left( {\begin{array}{*{20}c} {I_{j} } & {} & {} & {} \\ {} & {\Uplambda_{k} } & {} & {} \\ {} & {} & {0_{d - k - j} } & {} \\ {} & {} & {} & {0_{n - d} } \\ \end{array} } \right) $$

(12)

and

$$ V^{T} S_{w} V = \left( {\begin{array}{*{20}c} {0_{j} } & {} & {} & {} \\ {} & {\Upsigma_{k} } & {} & {} \\ {} & {} & {I_{d - k - j} } & {} \\ {} & {} & {} & {0_{n - d} } \\ \end{array} } \right), $$

(13)

where $ \Lambda _{k} \succ 0 $ and $ \Sigma _{k} \succ 0 $ are diagonal, Λ_k + Σ_k = I _k and the subscripts in I, 0, Λ, and Σ represent the size of the square matrices.

From (12) (13), it can be seen that V diagonalizes S _b and S _w simultaneously. Then the first j + k columns of V which form the range space of S _b are selected as projection vectors. According to rank(S _b) = j + k ≤ c − 1, the projection matrix of LDA/GSVD is often constructed by the first c − 1 columns of V.

The algorithm of LDA/GSVD proposed in [25] is summarized as follows:

1.
Compute H _b and H _w by (6) (7).
2.
Compute the singular value decomposition (SVD) of $ H = \left( {\begin{array}{*{20}c} {H_{b}^{T} } \\ {H_{w}^{T} } \\ \end{array} } \right) \in R^{(c + M) \times n} $, which is $ X^{T} HY = \left( {\begin{array}{*{20}c} Z & 0 \\ 0 & 0 \\ \end{array} } \right) $.
3.
Let d = rank(H) and compute the SVD of X (1:c, 1:d), which is $ U_{b}^{T} X $ (1:c, 1:d)A = Γ _b.
4.
Compute $ V = Y\left( {\begin{array}{*{20}c} {Z^{ - 1} A} & 0 \\ 0 & I \\ \end{array} } \right) $ and let $ G_{g} = V(:,1:c - 1) $
5.
For a sample x, the new LDA/GSVD feature is $ y = G_{g}^{T} x. $

3 From NLDA to LDA/GSVD: their relation under a condition

NLDA and LDA/GSVD solves SSS problem of LDA by different ways. They seem to have no connection. In this section, through analysis we will present the relation between NLDA and LDA/GSVD under a condition. The condition usually holds for high-dimensional data.

3.1 A condition

Assume that the columns of training sample matrix X = [X ₁, X ₂, …, X _c] are linearly independent, where $ X_{i} = [x_{i1} ,x_{i2} , \,\ldots,x_{{il_{i} }} ] \in R^{{n \times l_{i} }} $ is the training sample matrix of class i. As present in [19], under this assumption the following condition always holds:

$$ {\text{rank}}(S_{t} ) = M - 1,{\text{rank}}(S_{b} ) = c - 1,{\text{rank}}(S_{w} ) = M - c. $$

(14)

Because independence is often holds when the data have high dimensionality, condition (14) holds for high-dimensional data in many cases. While condition (14) is satisfied, we have

$$ {\text{rank}}(S_{b} ) + {\text{rank}}(S_{w} ) = {\text{rank}}(S_{t} ). $$

(15)

In the following analysis, we suppose that condition (14) is always satisfied.

3.2 Relation between NLDA and LDA/GSVD

From (12) (13), we know

$$ {\text{rank}}(S_{b} ) = {\text{rank}}(V^{T} S_{b} V) = j + k, $$

(16)

$$ {\text{rank}}(S_{w} ) = {\text{rank}}(V^{T} S_{w} V) = d - k - j + k = d - j $$

(17)

and

$$ V^{T} S_{t} V = V^{T} S_{b} V + V^{T} S_{w} V = \left( {\begin{array}{*{20}c} {I_{j} } & {} & {} & {} \\ {} & {\Uplambda_{k} + \Upsigma_{k} } & {} & {} \\ {} & {} & {I_{d - j - k} } & {} \\ {} & {} & {} & {0_{n - d} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {I_{d} } & {} \\ {} & {0_{n - d} } \\ \end{array} } \right). $$

(18)

It follows from (18) that

$$ {\text{rank}}(S_{t} ) = {\text{rank}}(V^{T} S_{t} V) = d. $$

(19)

If condition (14) is satisfied, from (15) (16) (17) we get

$$ d = {\text{rank}}(S_{t} ) = {\text{rank}}(S_{b} ) + {\text{rank}}(S_{w} ) = j + k + d - j = d + k. $$

(20)

It follows from (20) that k = 0. Then (12) (13) become

$$ V^{T} S_{b} V = \left( {\begin{array}{*{20}c} {I_{j} } & {} & {} \\ {} & {0_{d - j} } & {} \\ {} & {} & {0_{n - d} } \\ \end{array} } \right) $$

(21)

and

$$ V^{T} S_{w} V = \left( {\begin{array}{*{20}c} {0_{j} } & {} & {} \\ {} & {I_{d - j} } & {} \\ {} & {} & {0_{n - d} } \\ \end{array} } \right). $$

(22)

As stated in Sect. 2.2, the first j = rank(S _b) = c − 1 columns of V form the projection matrix of LDA/GSVD G _g. Hence,

$$ G_{g}^{T} S_{b} G_{g} = I_{j} = I_{c - 1} $$

(23)

and

$$ G_{g}^{T} S_{w} G_{g} = 0_{j} = 0_{c - 1} . $$

(24)

From Sect. 2.1, the projection matrix of NLDA G _n = PQR satisfies

$$ G_{n}^{T} S_{b} G_{n} = R^{T} Q^{T} P^{T} S_{b} PQR = R^{T} Q^{T} \tilde{S}_{b} QR = R^{T} \hat{S}_{b} R = \Uplambda_{f} $$

(25)

and

$$ G_{n}^{T} S_{w} G_{n} = R^{T} Q^{T} P^{T} S_{w} PQR = R^{T} Q^{T} \tilde{S}_{w} QR = R^{T} 0_{s} R = 0_{f} . $$

(26)

And we have f = rank(S _b) = c − 1 under condition (14). So

$$ G_{n}^{T} S_{b} G_{n} = \Uplambda_{c - 1} $$

(27)

and

$$ G_{n}^{T} S_{w} G_{n} = 0_{c - 1} . $$

(28)

From (23) (24) (27) (28), we can conclude that G _g and G _n are both derived from intersection of the null space of S _w and the range space of S _b, and the only difference of two algorithms is the diagonal matrix Λ_c−1 and I _c−1.

4 Modified NLDA

In this section, we develop a modified NLDA (MNLDA) algorithm whose performance is just equivalent to LDA/GSVD and much more efficient than LDA/GSVD. Further, the discriminating power and efficiency of MNLDA, LDA/GSVD, and NLDA are compared.

4.1 The algorithm

From Sect. 3.2, we know that the difference of NLDA and LDA/GSVD is the scale difference of Λ_c−1 and I _c−1 when condition (14) holds. If the scale of NLDA is modified, it will be equivalent to LDA/GSVD. Now we present the MNLDA algorithm as follows:

1.
Calculate S _t by (4). Work out S _t’s Eigen vectors p ₁, p ₂, …, p _d corresponding to nonzero Eigen values.
2.
Calculate S _b and S _w by (2) (3). Let P = (p ₁, p ₂, …, p _d) and Λ = P ^T S _t P. We obtain the new scatter matrices $ \tilde{S}_{b} = \Uplambda^{{ - \frac{1}{2}}} P^{T} S_{b} P\Uplambda^{{ - \frac{1}{2}}} $ and $ \tilde{S}_{w} = \Uplambda^{{ - \frac{1}{2}}} P^{T} S_{w} P\Uplambda^{{ - \frac{1}{2}}} $ after the removal of the null space of $ S_{t} $ and the normalization of Λ.
3.
Work out $ \tilde{S}_{w} $’s Eigen vectors q ₁, q ₂, …, q _s corresponding to 0 Eigen values. The projection matrix of MNLDA is $ G_{m} = P\Uplambda^{{ - \frac{1}{2}}} Q $, where Q = (q ₁, q ₂, …, q _s).
4.
For a sample x, the new MNLDA feature is y = $ G_{m}^{T} x $.

From the MNLDA algorithm, we have

$$ G_{m}^{T} S_{w} G_{m} = Q^{T} \Uplambda^{{ - \frac{1}{2}}} P^{T} S_{w} P\Uplambda^{{ - \frac{1}{2}}} Q = Q^{T} \tilde{S}_{w} Q = 0_{s} $$

(29)

and

$$ G_{m}^{T} S_{t} G_{m} = Q^{T} \Uplambda^{{ - \frac{1}{2}}} P^{T} S_{t} P\Uplambda^{{ - \frac{1}{2}}} Q = Q^{T} I_{d} Q = I_{s} . $$

(30)

It follows that

$$ G_{m}^{T} S_{b} G_{m} = G_{m}^{T} S_{t} G_{m} - G_{m}^{T} S_{w} G_{m} = I_{s} - 0_{s} = I_{s} . $$

(31)

Since rank(S _t) = M − 1 and rank(S _w) = M − c under condition (14), we have s = (M − 1) − (M − c) = c − 1. Hence

$$ G_{m}^{T} S_{b} G_{m} = I_{c - 1} $$

(32)

and

$$ G_{m}^{T} S_{w} G_{m} = 0_{c - 1} . $$

(33)

From (23) (24) (32) (33), we can make a conclusion that MNLDA is equivalent to LDA/GSVD.

4.2 Comparison of MNLDA, LDA/GSVD, and NLDA

The SVD process in step 2 of LDA/GSVD algorithm is time-consuming especially for high-dimensional data, whereas the Eigen value decomposition (EVD) process in MNLDA and NLDA algorithm is very quick. Hence, NLDA and MNLDA are more efficient than LDA/GSVD.

From the above discussion, it can be known that MNLDA and LDA/GSVD have the same discriminating power. Now we compare the discriminating capability of MNLDA and NLDA. In the literature [19], the authors indicated that step 4 of NLDA algorithm can be removed and proposed the improved NLDA (INLDA) algorithm, which is equivalent to NLDA when condition (14) is satisfied. The only difference between INLDA and MNLDA is the normalization process after the diagonalization of S _t. From the equivalence of NLDA and INLDA, we know that it is only the normalization process that leads to the different discriminating power of MNLDA and NLDA. In the experiments, it can be seen that this process increases the discriminating capability sometimes and reduces it sometimes. It mainly depends on the characteristics of the data.

5 Experiments

In this section, we will verify our viewpoints through face and finger-knuckle-print recognition experiments. NLDA, LDA/GSVD, and MNLDA are used for feature extraction separately. The dimensionality of reduced feature spaces is kept as c − 1. Then the nearest neighbor classifier with Euclidean metric is employed for classification.

5.1 Face recognition experiment

5.1.1 Face data sets

ORL [29], FERET [30–32], and Yale [33] face databases are applied in our experiments.

5.1.1.1 ORL face database

ORL face database contains 400 face images of 40 individuals, each providing 10 different images. For some individuals, the images were taken at different times. The face images are centralized. The facial expression (open or closed eyes, smiling or nonsmiling) and facial details (glasses or no glasses) also vary. The images were taken with a tolerance for some titling and rotation of the face of up to 20°. Moreover, there is also some variation in the scale of up to 10%. The images are all grayscale and normalized to a resolution 92 × 112 pixels. Figure 1 shows ten images of one person.

5.1.1.2 FERET face database

The FERET face database was sponsored by the US Department of Defense through the DARPA Program. It has become a standard database for testing and evaluating face recognition algorithms. We performed algorithms on a subset of the FERET database. The subset is composed of 1,400 images of 200 individuals, and each individual has seven images. It involves variations in face expression, pose, and illumination. In the experiment, the facial portion of the original image was cropped based on the location of eyes and mouth. Then we resized the cropped images to 80 × 80 pixels. Seven sample images of one individual are show in Fig. 2.

5.1.1.3 Yale face database

The Yale face database was constructed at the Yale Center for Computational Vision and Control. It contains 165 gray-scale images of 15 individuals, and each person has 11 different images under various facial expressions (normal, happy, sad, sleepy, surprised, and wink), lighting conditions (left-light, center-light, right-light), and with/without glasses. In the experiment, each image was manually cropped and resized to 100 × 80 pixels. Figure 3 shows sample images of one person.

5.1.2 Experimental results

On ORL database, three images/people are randomly selected for training and the remaining images are used for testing. Features are extracted separately by NLDA, LDA/GSVD, and MNLDA. Then the nearest neighbor classifier with Euclidean distance is applied to classify the new features. Experiment is repeated for 10 times. Figure 4 shows the recognition rates versus the 10 different training sets. Table 1 lists the average recognition rates and standard deviations across 10 runs. Table 2 lists the computational time of each method.

Table 1 The average recognition rates (%) and the corresponding standard deviation of MNLDA, LDA/GSVD, and NLDA on ORL database across 10 runs when 3 images/individual are randomly used for training

Full size table

Table 2 The computational time (s) of MNLDA, LDA/GSVD, and NLDA on ORL database across 10 runs when three images/individual are randomly used for training (CPU: pentium 1.8 GHz RAM: 1 Gb)

Full size table

Figure 4 shows that MNLDA has the same recognition rates as LDA/GSVD consistently and that NLDA outperforms MNLDA and LDA/GSVD irrespective of the variation of training sets. From Table 1, we can see that the average recognition rates of MNLDA and LDA/GSVD are only 86.1% while NLDA reaches 90.5%. From Table 2, it can be seen that NLDA and MNLDA are more efficient than LDA/GSVD. The average computational time of LDA/GSVD is about two hundred times as much as that of NLDA.

On FERET database, we select the first α (α varies from 2 to 5) images/people for training and the rest images for testing. After feature extraction by NLDA, LDA/GSVD, and MNLDA, the nearest neighbor classifier with Euclidean distance is employed for classification. Figure 5 illustrates the recognition rates versus training sample size. The computational time of each method is shown in Table 3.

Table 3 The computational time (s) of MNLDA, LDA/GSVD, and NLDA on FERET database when the first α (α varies from 2 to 5) images are used for training (CPU: pentium 1.8 GHz RAM: 1 Gb)

Full size table

From Fig. 5, we can also see that MNLDA and LDA/GSVD have the identical recognition rates and NLDA has a higher recognition rates than them irrespective of the training sample size. From Table 3, it can be seen that MNLDA and NLDA are more efficient than LDA/GSVD obviously.

On Yale database, the first β (β varies from 2 to 7) images/people are selected to form the training set and the remaining 11 − β images are used for testing. NLDA, LDA/GSVD, and MNLDA are used for feature extraction separately. After feature extraction, the nearest neighbor classifier with Euclidean distance is used for classification. Figure 6 shows the recognition rates versus training sample size. The computational time of each method is listed in Table 4.

Table 4 The computational time (s) of MNLDA, LDA/GSVD, and NLDA on Yale database when the first β (β varies from 2 to 7) images are used for training (CPU: pentium 1.8 GHz RAM: 1 Gb)

Full size table

From Table 4, we can see that MNLDA and NLDA have higher efficiency than LDA/GSVD. Figure 6 shows that the recognition rates of MNLDA are the same as LDA/GSVD irrespective of the variation in training sample size. These two points are consistent with the experimental results on ORL and FERET databases. However, from Fig. 6 we can also find an inconsistent point that NLDA outperforms MNLDA and LDA/GSVD on Yale database.

What is the reason of the results that NLDA outperforms MNLDA on ORL and FERET databases and MNLDA outperforms NLDA on Yale database? We suspect it can be explained as follows: From Sect. 4, we know that it is only the normalization process after the diagonalization of S _t that leads the different discriminating power of MNLDA and NLDA. In paper [34], the authors point out that the principal Eigen vectors of S _t contain two components: intrinsic difference ($ \tilde{I} $) that discriminates different face identity and transformation difference ($ \tilde{T} $) arising from all kinds of transformations such as expression and illumination. On ORL and FERET databases, $ \tilde{I} $ may be contained on the principal Eigen vectors corresponding to large Eigen values while $ \tilde{T} $ corresponding to small Eigen values. The normalization process after the diagonalization of S _t will enlarge the effect of $ \tilde{T} $ and reduce the effect of $ \tilde{I} $ which is important for recognition. So MNLDA involving normalization process performs worse. Inversely, on Yale database, $ \tilde{I} $ corresponds to small Eigen values while $ \tilde{T} $ large. So the normalization process will enlarge the effect of $ \tilde{I} $ and reduce the effect of $ \tilde{T} $. Hence MNLDA performs better.

5.2 Finger-knuckle-print recognition experiment on the PolyU FKP database

Finger-knuckle-print (FKP) images on the PolyU FKP database were collected from 165 volunteers, including 125 men and 40 women. Among them, 143 subjects were 20–30 years old and the others were 30–50 years old. The samples were collected in two separate sessions. In each session, the subject was asked to provide 6 images for each of the left index finger, the left middle finger, the right index finger, and the right middle finger. Therefore, 48 images from 4 fingers were collected from each subject. In total, the database contains 7,920 images from 660 different fingers. The average time interval between the first and the second sessions was about 25 days. The maximum and minimum intervals were 96 and 14 days, respectively. The images were processed by ROI extraction algorithm described in [35]. In the experiment, we select 1200 FKP images of the right index finger of 100 subjects and these images were resized to 55 × 110 pixels. Figure 7 shows 12 sample images of one right index finger.

The first six images collected in the first session are used for training and the rest six collected in the second session for testing. Firstly, LDA/GSVD and MNLDA are performed for feature extraction. Secondly, the nearest neighbor classifier is performed for classification. Table 5 lists the recognition rates and the computational time of two methods. Table 5 shows that MNLDA is much more efficient than LDA/GSVD. However, from Table 5, it can be seen that LDA/GSVD and MNLDA obtain different recognition rates. This is not consistent with the results received from face recognition experiments. Why this happens? From the analysis above, we know LDA/GSVD and MNLDA have the same discriminating capability when condition (14) is satisfied. The number of training samples is 600 in this experiment. We find that rank(S _t) = 593 ≠ 600 − 1, namely condition (14) is not satisfied. Therefore, the equivalence of LDA/GSVD and MNLDA does not hold.

Table 5 The recognition rates (%) and computational time (s) of MNLDA and LDA/GSVD on the PolyU FKP database when the first six FKP images are used for training (CPU: pentium 1.8 GHz RAM: 1 Gb)

Full size table

6 Conclusion

In this paper, we analyze NLDA and LDA/GSVD algorithms and show their relationship under a condition which is often satisfied for high-dimensional data. Based on the relationship, MNLDA algorithm which is equivalent to LDA/GSVD is proposed. When the given condition is satisfied, MNLDA has the same discriminating power as LDA/GSVD and is more efficient. We also compare the efficiency and discriminating power of NLDA and LDA/GSVD. Like MNLDA, NLDA is more efficient than LDA/GSVD. Face recognition experiments demonstrate the equivalence of LDA/GSVD and MNLDA under the given condition and MNLDA is more efficient. FKP recognition experiment shows that these two algorithms have different discriminating power when the given condition does not hold. Moreover, the different discriminating power of NLDA and MNLDA mainly depends on the characteristic of the data. We give our interpretation about this through face recognition experiments.

References

Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323
Article Google Scholar
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326
Article Google Scholar
Belkin M, Niyogi P (2003) Laplacian Eigen maps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Article MATH Google Scholar
He X, Yan S, Hu Y, Niyogi P, Zhang HJ (2005) Face recognition using Laplacian faces. IEEE Transact Pattern Anal Mach Intell 27(3):328–340
Article Google Scholar
Fu Y, Huang TS (2005) Locally linear embedded Eigen space analysis. University Illinois Urbana-Champaign, Urbana, IL. Technical Report IFP-TR, ECE
Yang J, Zhang D, Yang JY, Niu B (2007) Globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics. IEEE Transact Pattern Anal Mach Intell 29(4):650–664
Article Google Scholar
Yan S, Xu D, Zhang B, Zhang HJ, Yang Q, Lin S (2007) Graph embedding, extensions: a general framework for dimensionality reduction. IEEE Transact Pattern Anal Mach Intell 29(1):40–51
Article Google Scholar
Chen HT, Chang HW, Liu TL (2005) Local discriminant embedding and its variants, In: Proceedings of CVPR 2005
Fu Y, Liu M, Huang TS (2007) Conformal embedding analysis with local graph modeling on the unit hyper sphere, In: Proceedings of CVPR 2007
Cai D, He X, Zhou K, Han J, Bao H (2007) Locality sensitive discriminant analysis, In: Proceedings of IJCAI
Yang WK, Sun CY, Zhang L (2011) A multi-manifold discriminant analysis method for image feature extraction. Pattern Recogn 44(8):1649–1657
Article MATH Google Scholar
Yang WK, Wang JG, Ren MW, Yang JY, Zhang L, Liu GH (2009) Feature extraction based on Laplacian bidirectional maximum margin criterion. Pattern Recogn 42(11):2327–2334
Article MATH Google Scholar
Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, NY
MATH Google Scholar
Belhumeur P, Hespanha J, Kriegman D (1997) Eigen faces versus fisher faces: recognition using class specific linear projection. IEEE Transact Pattern Anal Mach Intell 19(7):711–720
Article Google Scholar
Zuo W, Zhang D, Yang J, Wang K (2006) BDPCA plus LDA: a novel fast feature extraction technique for face recognition. IEEE Transact Syst, Man, Cybern-Part B: Cybern 36(4):946–953
Article Google Scholar
Jing XY, Zhang D, Tang YY (2004) An improved LDA approach. IEEE Transact Syst, Man, Cybern-Part B: Cybern 34(5):1942–1951
Article Google Scholar
Friedman JH (1989) Regularized discriminant analysis. J Am Stat Assoc 84(405):165–175
Article Google Scholar
Chen L-F, Liao H-YM, Ko M-T, Lin J-C, Yu G-J (2000) A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recogn 33(10):1713–1726
Article Google Scholar
Ye J, Xiong T (2006) Computational and theoretical analysis of null space and orthogonal linear discriminant analysis. J Mach Learn Res 7:1183–1204
MathSciNet MATH Google Scholar
Yu H, Yang J (2001) A direct LDA algorithm for high dimensional data with application to face recognition. Pattern Recogn 34(10):2067–2070
Article MATH Google Scholar
Yang J, Yang JY (2003) Why can LDA be performed in PCA transformed space? Pattern Recogn 36(2):563–566
Article Google Scholar
Yang J, Alejandro FF, Yang J, Zhang D, Jin Z (2005) KPCA plus LDA: a complete kernel fisher discriminant framework for feature extraction and recognition. IEEE Transact Pattern Anal Mach Intell 27(2):230–244
Article Google Scholar
Zhuang XS, Dai DQ (2005) Inverse fisher discriminate criteria for small sample size problem and its application to face recognition. Pattern Recogn 38(11):2192–2194
Article Google Scholar
Howland P, Jeon M, Park H (2003) Structure preserving dimension reduction for clustered text data based on the generalized singular value decomposition. SIAM J Matrix Anal Appl 25(1):165–179
Article MathSciNet MATH Google Scholar
Howland P, Park H (2004) Generalizing discriminant analysis using the generalized singular value decomposition. IEEE Transact Pattern Anal Mach Intell 26(8):995–1006
Article Google Scholar
Turk MA, Pentland AP (1991) Face recognition using Eigen faces. In: Proceedings of CVPR 1991
Huang R, Liu Q, Lu H, Ma S (2002) Solving the small sample size problem of LDA. In: Proceedings of ICPR 2002
Paige CC, Saunders MA (1981) Towards a generalized singular value decomposition. SIAM J Num Anal 18(3):398–405
Article MathSciNet MATH Google Scholar
http://www.cvc.yale.edu/projects/yalefaces/yalefaces.html
Phillips PJ, Moon H, Rizvi SA, Rauss PJ (2000) The FERET evaluation methodology for face recognition algorithms. IEEE Transact Pattern Anal Mach Intell 22(10):1090–1104
Article Google Scholar
Phillips PJ (2006) The facial recognition technology (FERET) database. http://www.itl.nist.gov/iad/humanid/feret/feret_master.html
Phillips PJ, Wechsler H, Huang J, Rauss P (1998) The FERET database and evaluation procedure for face recognition algorithms. Image Vis Comput 16(5):295–306
Article Google Scholar
http://www.cam-orl.co.uk
Wang X, Tang X (2004) A unified framework for subspace face recognition. IEEE Transact Pattern Anal Mach Intell 26(9):1222–1228
Article MathSciNet Google Scholar
Zhang L, Zhang L, Zhang D, Zhu HL (2010) Online finger-knuckle-print verification for personal authentications. Pattern Recogn 43(7):2560–2571
Article MATH Google Scholar

Download references

Acknowledgements

This work is supported by the National Science Foundation of China under Grants No. 60632050, No. 60873151, No. 60973098.

Author information

Authors and Affiliations

School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, 210094, China
Jun Yin & Zhong Jin

Authors

Jun Yin
View author publications
You can also search for this author in PubMed Google Scholar
Zhong Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Yin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yin, J., Jin, Z. From NLDA to LDA/GSVD: a modified NLDA algorithm. Neural Comput & Applic 21, 1575–1583 (2012). https://doi.org/10.1007/s00521-011-0728-x

Download citation

Received: 02 August 2010
Accepted: 03 August 2011
Published: 19 August 2011
Issue Date: October 2012
DOI: https://doi.org/10.1007/s00521-011-0728-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

From NLDA to LDA/GSVD: a modified NLDA algorithm

Abstract

Similar content being viewed by others

Two-dimensional Bhattacharyya bound linear discriminant analysis with its applications

A New Linear Discriminant Analysis Method to Address the Over-Reducing Problem

Nonparametric Discriminant Multi-manifold Learning

1 Introduction