1 Introduction

In pattern recognition, the data often lie in a high-dimensional space. In order to analyze the data, we need to reduce the dimensionality and find the most important features. Feature extraction is an essential technique to deal with this problem, and it has attracted much attention in computer vision and pattern recognition because of its great importance. In the past decades, numerous feature extraction methods have been proposed [112]. Linear discriminant analysis (LDA) [13] is one of the most popular feature extraction and dimensionality reduction methods, which has been widely applied in high-dimensional pattern recognition problems [1416]. It seeks to find the optimal projection vector by maximizing between-class scatter and minimizing within-class scatter simultaneously. The criterion of LDA is maximizing

$$ J_{F} (W) = \frac{{J_{b} (W)}}{{J_{w} (W)}} = \frac{{{\text{trace}}(W^{T} S_{b} W)}}{{{\text{trace}}(W^{T} S_{w} W)}}, $$
(1)

where S b is the between-class scatter matrix, S w is the within-class scatter matrix, and W is the projection matrix composed by optimal projection vectors. Suppose that we have M samples of c classes, S b , S w and the global scatter matrix S t can be defined by

$$ S_{b} = \frac{1}{M}\sum\limits_{i = 1}^{c} {l_{i} (m_{i} - m)} (m_{i} - m)^{T} , $$
(2)
$$ S_{w} = \frac{1}{M}\sum\limits_{i = 1}^{c} {\sum\limits_{j = 1}^{{l_{i} }} {(x_{ij} - m_{i} )(x_{ij} - m_{i} )^{T} } } $$
(3)

and

$$ S_{t} = \frac{1}{M}\sum\limits_{i = 1}^{c} {\sum\limits_{j = 1}^{{l_{i} }} {(x_{ij} - m)(x_{ij} - m)^{T} } } , $$
(4)

where x ij is the training sample j in class i, m is the global centroid, m i (i = 1, 2, …, c) is centroid of class i, and l i (i = 1, 2, …, c) is the number of training samples in class i.

From formula (3), it can be easily known that the rank of S w is not larger than the maximum of the dimensionality of samples and the number of training samples. If the dimensionality of samples is too large, S w will be singular. At this time, classical LDA cannot be performed successfully. This is so-called small sample size (SSS) problem, namely the number of training samples is relatively small compared to the dimensionality of samples. In order to solve this problem, several approaches have been proposed in the past few years. Among them, regularized LDA (RLDA) [17], PCA (principal component analysis) plus LDA (FDA) [14], null space LDA (NLDA) [18, 19], direct LDA (DLDA) [20], complete LDA (CLDA) [21, 22], inverse Fisher discriminant analysis (IFDA) [23], and LDA based on generalized singular value decomposition (LDA/GSVD) [24, 25] are the most well-known methods. RLDA makes S w nonsingular by adding a diagonal matrix αI (α > 0) to S w . In FDA, PCA [26] is performed first for dimensionality reduction to make S w nonsingular before performing LDA. NLDA assumes that the null space of S w contains the most useful discriminative information and seeks projection vectors in the null space of S w . DLDA searches discriminative information in the range space of S b and solves the Fisher criterion (1) by diagonalizing S b before diagonalizing S w . CLDA extracts features separately from the null space of S w and the range space of S w . In IFDA, a restriction is added in PCA procedure and introduces inverse Fisher criterion. LDA/GSVD uses generalized singular value decomposition (GSVD) to solve singular problem in classical LDA.

We analyze NLDA with LDA/GSVD in theory and show their relationship under a condition which always holds for high-dimensional data. It can be found from the analysis that there is only a little difference between NLDA and LDA/GSVD. If we make a modification for NLDA, it will become equivalent to LDA/GSVD. As we know, the disadvantage of LDA/GSVD is its high computational time and NLDA is much more efficient. Therefore, we propose a modified NLDA (MNLDA) algorithm in this paper. If the given condition is satisfied, MNLDA has the same discriminating power as LDA/GSVD but is more efficient. In addition, we compare the discriminating power of NLDA and MNLDA. Sometimes NLDA outperforms MNLDA and MNLDA performs better sometimes. This mainly depends on the characteristic of the data. We give our interpretation about this according to the experiments.

The rest of the paper is organized as follows: Sect. 2 describes NLDA and LDA/GSVD. We analyze the relation between NLDA and LDA/GSVD in Sect. 3. Section 4 proposes MNLDA algorithm and presents the comparison of MNLDA, LDA/GSVD, and NLDA. Section 5 describes experiments on ORL, FERET, Yale face databases, and the Hong Kong polytechnic university finger-knuckle-print (PolyU FKP) database. Our conclusions are provided in Sect. 6 at last.

2 NLDA and LDA/GSVD

2.1 NLDA

Chen et al. [18] proved that the null space of S w contains the most discriminative information and proposed NLDA algorithm which searches projection vectors in the null space of S w . The criterion of NLDA is defined by

$$ W = \arg \max_{{W^{T} S_{w} W = 0}} {\text{trace}}(W^{T} S_{b} W). $$
(5)

In order to improve the efficiency of the algorithm, the authors used a pixel grouping method to reduce the dimensionality of samples before performing NLDA in [18]. Based on the observation that the null space of S t does not contain discriminative information, Huang et al. [27] removed the null space of S t first and then perform NLDA.

NLDA algorithm is summarized as follows:

  1. 1.

    Calculate S t by (4). Work out S t ’s Eigen vectors p 1, p 2, …, p d corresponding to nonzero Eigen values.

  2. 2.

    Calculate S b and S w by (2) (3). Let P = (p 1, p 2, …, p d ). We get the new scatter matrices \( \tilde{S}_{b} = P^{T} S_{b} P \) and \( \tilde{S}_{w} = P^{T} S_{w} P \) after the removal of the null space of S t .

  3. 3.

    Work out \( \tilde{S}_{w} \)’s Eigen vectors q 1, q 2, …, q s corresponding to zero Eigen values and project the data into the null space of \( \tilde{S}_{w} \). Now the between-class scatter matrix becomes \( \hat{S}_{b} = Q^{T} \tilde{S}_{b} Q \), where Q = (q 1, q 2, …, q s ).

  4. 4.

    Work out \( \hat{S}_{b} \)’s Eigen vectors r 1, r 2, …, r f corresponding to nonzero Eigen values. Let R = (r 1, r 2, …, r f ). We have the projection matrix of NLDA G n  = PQR.

  5. 5.

    For a sample x, the new NLDA feature is y = G T n x.

2.2 LDA/GSVD

Howland et al. [24] applied GSVD due to Paige et al. [28] to solve the SSS problem of LDA. For the n-dimensional data, we define

$$ H_{b} = \left[ {\frac{{\sqrt {l_{1} } }}{\sqrt M }(m_{1} - m),\frac{{\sqrt {l_{2} } }}{\sqrt M }(m_{2} - m), \,\ldots ,\frac{{\sqrt {l_{c} } }}{\sqrt M }(m_{c} - m)} \right] $$
(6)

and

$$ H_{w} = \frac{1}{\sqrt M }[X_{1} - m_{1} e_{1} ,X_{2} - m_{2} e_{2} , \,\ldots,X_{c} - m_{c} e_{c} ], $$
(7)

where \( X_{i} = [x_{i1} ,x_{i2} , \,\ldots,x_{{il_{i} }} ] \in R^{{n \times l_{i} }} \) is the training sample matrix of class i and \( e_{i} = [1,1, \,\ldots,1] \in R^{{1 \times l_{i} }} \). Then we have \( S_{b} = H_{b} H_{b}^{T} \) and \( S_{w} = H_{w} H_{w}^{T} \). Through applying GSVD to \( H_{b}^{T} \) and \( H_{w}^{T} \), we obtain

$$ U_{b}^{T} H_{b}^{T} V = \left[\overbrace {{\Upgamma_{b} }}^{d},\overbrace {0}^{n - d}\right] $$
(8)

and

$$ U_{w}^{T} H_{w}^{T} V = \left[\overbrace {{\Upgamma_{w} }}^{d},\overbrace {0}^{n - d}\right], $$
(9)

where U b and U w are orthogonal, V is nonsingular, \( \Upgamma_{b}^{T} \Upgamma_{b} \) and \( \Upgamma_{w}^{T} \Upgamma_{w} \) are diagonal matrices which satisfy \( \Upgamma_{b}^{T} \Upgamma_{b} \) + \( \Upgamma_{w}^{T} \Upgamma_{w} \) = I and d is the rank of \( \left( {\begin{array}{*{20}c} {H_{b}^{T} } \\ {H_{w}^{T} } \\ \end{array} } \right) \). It follows from (8) (9) that

$$ (U_{b}^{T} H_{b}^{T} V)^{T} U_{b}^{T} H_{b}^{T} V = V^{T} H_{b} H_{b}^{T} V = V^{T} S_{b} V = [\Upgamma_{b} ,0]^{T} [\Upgamma_{b} ,0] = \left( {\begin{array}{*{20}c} {\Upgamma_{b}^{T} \Upgamma_{b} } & 0 \\ 0 & 0 \\ \end{array} } \right) $$
(10)

and

$$ (U_{w}^{T} H_{w}^{T} V)^{T} U_{w}^{T} H_{w}^{T} V = V^{T} H_{w} H_{w}^{T} V = V^{T} S_{w} V = [\Upgamma_{w} ,0]^{T} [\Upgamma_{w} ,0] = \left( {\begin{array}{*{20}c} {\Upgamma_{w}^{T} \Upgamma_{w} } & 0 \\ 0 & 0 \\ \end{array} } \right). $$
(11)

Because \( \Upgamma_{b}^{T} \Upgamma_{b} \) and \( \Upgamma_{w}^{T} \Upgamma_{w} \) are diagonal and \( \Upgamma_{b}^{T} \Upgamma_{b} \) + \( \Upgamma_{w}^{T} \Upgamma_{w} \) = I, we have

$$ V^{T} S_{b} V = \left( {\begin{array}{*{20}c} {I_{j} } & {} & {} & {} \\ {} & {\Uplambda_{k} } & {} & {} \\ {} & {} & {0_{d - k - j} } & {} \\ {} & {} & {} & {0_{n - d} } \\ \end{array} } \right) $$
(12)

and

$$ V^{T} S_{w} V = \left( {\begin{array}{*{20}c} {0_{j} } & {} & {} & {} \\ {} & {\Upsigma_{k} } & {} & {} \\ {} & {} & {I_{d - k - j} } & {} \\ {} & {} & {} & {0_{n - d} } \\ \end{array} } \right), $$
(13)

where \( \Lambda _{k} \succ 0 \) and \( \Sigma _{k} \succ 0 \) are diagonal, Λ k  + Σ k  = I k and the subscripts in I, 0, Λ, and Σ represent the size of the square matrices.

From (12) (13), it can be seen that V diagonalizes S b and S w simultaneously. Then the first j + k columns of V which form the range space of S b are selected as projection vectors. According to rank(S b ) = j + k ≤ c − 1, the projection matrix of LDA/GSVD is often constructed by the first c − 1 columns of V.

The algorithm of LDA/GSVD proposed in [25] is summarized as follows:

  1. 1.

    Compute H b and H w by (6) (7).

  2. 2.

    Compute the singular value decomposition (SVD) of \( H = \left( {\begin{array}{*{20}c} {H_{b}^{T} } \\ {H_{w}^{T} } \\ \end{array} } \right) \in R^{(c + M) \times n} \), which is \( X^{T} HY = \left( {\begin{array}{*{20}c} Z & 0 \\ 0 & 0 \\ \end{array} } \right) \).

  3. 3.

    Let d = rank(H) and compute the SVD of X (1:c, 1:d), which is \( U_{b}^{T} X \) (1:c, 1:d)A = Γ b .

  4. 4.

    Compute \( V = Y\left( {\begin{array}{*{20}c} {Z^{ - 1} A} & 0 \\ 0 & I \\ \end{array} } \right) \) and let \( G_{g} = V(:,1:c - 1) \)

  5. 5.

    For a sample x, the new LDA/GSVD feature is \( y = G_{g}^{T} x. \)

3 From NLDA to LDA/GSVD: their relation under a condition

NLDA and LDA/GSVD solves SSS problem of LDA by different ways. They seem to have no connection. In this section, through analysis we will present the relation between NLDA and LDA/GSVD under a condition. The condition usually holds for high-dimensional data.

3.1 A condition

Assume that the columns of training sample matrix X = [X 1, X 2, …, X c ] are linearly independent, where \( X_{i} = [x_{i1} ,x_{i2} , \,\ldots,x_{{il_{i} }} ] \in R^{{n \times l_{i} }} \) is the training sample matrix of class i. As present in [19], under this assumption the following condition always holds:

$$ {\text{rank}}(S_{t} ) = M - 1,{\text{rank}}(S_{b} ) = c - 1,{\text{rank}}(S_{w} ) = M - c. $$
(14)

Because independence is often holds when the data have high dimensionality, condition (14) holds for high-dimensional data in many cases. While condition (14) is satisfied, we have

$$ {\text{rank}}(S_{b} ) + {\text{rank}}(S_{w} ) = {\text{rank}}(S_{t} ). $$
(15)

In the following analysis, we suppose that condition (14) is always satisfied.

3.2 Relation between NLDA and LDA/GSVD

From (12) (13), we know

$$ {\text{rank}}(S_{b} ) = {\text{rank}}(V^{T} S_{b} V) = j + k, $$
(16)
$$ {\text{rank}}(S_{w} ) = {\text{rank}}(V^{T} S_{w} V) = d - k - j + k = d - j $$
(17)

and

$$ V^{T} S_{t} V = V^{T} S_{b} V + V^{T} S_{w} V = \left( {\begin{array}{*{20}c} {I_{j} } & {} & {} & {} \\ {} & {\Uplambda_{k} + \Upsigma_{k} } & {} & {} \\ {} & {} & {I_{d - j - k} } & {} \\ {} & {} & {} & {0_{n - d} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {I_{d} } & {} \\ {} & {0_{n - d} } \\ \end{array} } \right). $$
(18)

It follows from (18) that

$$ {\text{rank}}(S_{t} ) = {\text{rank}}(V^{T} S_{t} V) = d. $$
(19)

If condition (14) is satisfied, from (15) (16) (17) we get

$$ d = {\text{rank}}(S_{t} ) = {\text{rank}}(S_{b} ) + {\text{rank}}(S_{w} ) = j + k + d - j = d + k. $$
(20)

It follows from (20) that k = 0. Then (12) (13) become

$$ V^{T} S_{b} V = \left( {\begin{array}{*{20}c} {I_{j} } & {} & {} \\ {} & {0_{d - j} } & {} \\ {} & {} & {0_{n - d} } \\ \end{array} } \right) $$
(21)

and

$$ V^{T} S_{w} V = \left( {\begin{array}{*{20}c} {0_{j} } & {} & {} \\ {} & {I_{d - j} } & {} \\ {} & {} & {0_{n - d} } \\ \end{array} } \right). $$
(22)

As stated in Sect. 2.2, the first j = rank(S b ) = c − 1 columns of V form the projection matrix of LDA/GSVD G g . Hence,

$$ G_{g}^{T} S_{b} G_{g} = I_{j} = I_{c - 1} $$
(23)

and

$$ G_{g}^{T} S_{w} G_{g} = 0_{j} = 0_{c - 1} . $$
(24)

From Sect. 2.1, the projection matrix of NLDA G n  = PQR satisfies

$$ G_{n}^{T} S_{b} G_{n} = R^{T} Q^{T} P^{T} S_{b} PQR = R^{T} Q^{T} \tilde{S}_{b} QR = R^{T} \hat{S}_{b} R = \Uplambda_{f} $$
(25)

and

$$ G_{n}^{T} S_{w} G_{n} = R^{T} Q^{T} P^{T} S_{w} PQR = R^{T} Q^{T} \tilde{S}_{w} QR = R^{T} 0_{s} R = 0_{f} . $$
(26)

And we have f = rank(S b ) = c − 1 under condition (14). So

$$ G_{n}^{T} S_{b} G_{n} = \Uplambda_{c - 1} $$
(27)

and

$$ G_{n}^{T} S_{w} G_{n} = 0_{c - 1} . $$
(28)

From (23) (24) (27) (28), we can conclude that G g and G n are both derived from intersection of the null space of S w and the range space of S b , and the only difference of two algorithms is the diagonal matrix Λ c−1 and I c−1.

4 Modified NLDA

In this section, we develop a modified NLDA (MNLDA) algorithm whose performance is just equivalent to LDA/GSVD and much more efficient than LDA/GSVD. Further, the discriminating power and efficiency of MNLDA, LDA/GSVD, and NLDA are compared.

4.1 The algorithm

From Sect. 3.2, we know that the difference of NLDA and LDA/GSVD is the scale difference of Λ c−1 and I c−1 when condition (14) holds. If the scale of NLDA is modified, it will be equivalent to LDA/GSVD. Now we present the MNLDA algorithm as follows:

  1. 1.

    Calculate S t by (4). Work out S t ’s Eigen vectors p 1, p 2, …, p d corresponding to nonzero Eigen values.

  2. 2.

    Calculate S b and S w by (2) (3). Let P = (p 1, p 2, …, p d ) and Λ = P T S t P. We obtain the new scatter matrices \( \tilde{S}_{b} = \Uplambda^{{ - \frac{1}{2}}} P^{T} S_{b} P\Uplambda^{{ - \frac{1}{2}}} \) and \( \tilde{S}_{w} = \Uplambda^{{ - \frac{1}{2}}} P^{T} S_{w} P\Uplambda^{{ - \frac{1}{2}}} \) after the removal of the null space of \( S_{t} \) and the normalization of Λ.

  3. 3.

    Work out \( \tilde{S}_{w} \)’s Eigen vectors q 1, q 2, …, q s corresponding to 0 Eigen values. The projection matrix of MNLDA is \( G_{m} = P\Uplambda^{{ - \frac{1}{2}}} Q \), where Q = (q 1q 2, …, q s ).

  4. 4.

    For a sample x, the new MNLDA feature is y = \( G_{m}^{T} x \).

From the MNLDA algorithm, we have

$$ G_{m}^{T} S_{w} G_{m} = Q^{T} \Uplambda^{{ - \frac{1}{2}}} P^{T} S_{w} P\Uplambda^{{ - \frac{1}{2}}} Q = Q^{T} \tilde{S}_{w} Q = 0_{s} $$
(29)

and

$$ G_{m}^{T} S_{t} G_{m} = Q^{T} \Uplambda^{{ - \frac{1}{2}}} P^{T} S_{t} P\Uplambda^{{ - \frac{1}{2}}} Q = Q^{T} I_{d} Q = I_{s} . $$
(30)

It follows that

$$ G_{m}^{T} S_{b} G_{m} = G_{m}^{T} S_{t} G_{m} - G_{m}^{T} S_{w} G_{m} = I_{s} - 0_{s} = I_{s} . $$
(31)

Since rank(S t ) = M − 1 and rank(S w ) = M − c under condition (14), we have s = (M − 1) − (M − c) = c − 1. Hence

$$ G_{m}^{T} S_{b} G_{m} = I_{c - 1} $$
(32)

and

$$ G_{m}^{T} S_{w} G_{m} = 0_{c - 1} . $$
(33)

From (23) (24) (32) (33), we can make a conclusion that MNLDA is equivalent to LDA/GSVD.

4.2 Comparison of MNLDA, LDA/GSVD, and NLDA

The SVD process in step 2 of LDA/GSVD algorithm is time-consuming especially for high-dimensional data, whereas the Eigen value decomposition (EVD) process in MNLDA and NLDA algorithm is very quick. Hence, NLDA and MNLDA are more efficient than LDA/GSVD.

From the above discussion, it can be known that MNLDA and LDA/GSVD have the same discriminating power. Now we compare the discriminating capability of MNLDA and NLDA. In the literature [19], the authors indicated that step 4 of NLDA algorithm can be removed and proposed the improved NLDA (INLDA) algorithm, which is equivalent to NLDA when condition (14) is satisfied. The only difference between INLDA and MNLDA is the normalization process after the diagonalization of S t . From the equivalence of NLDA and INLDA, we know that it is only the normalization process that leads to the different discriminating power of MNLDA and NLDA. In the experiments, it can be seen that this process increases the discriminating capability sometimes and reduces it sometimes. It mainly depends on the characteristics of the data.

5 Experiments

In this section, we will verify our viewpoints through face and finger-knuckle-print recognition experiments. NLDA, LDA/GSVD, and MNLDA are used for feature extraction separately. The dimensionality of reduced feature spaces is kept as c − 1. Then the nearest neighbor classifier with Euclidean metric is employed for classification.

5.1 Face recognition experiment

5.1.1 Face data sets

ORL [29], FERET [3032], and Yale [33] face databases are applied in our experiments.

5.1.1.1 ORL face database

ORL face database contains 400 face images of 40 individuals, each providing 10 different images. For some individuals, the images were taken at different times. The face images are centralized. The facial expression (open or closed eyes, smiling or nonsmiling) and facial details (glasses or no glasses) also vary. The images were taken with a tolerance for some titling and rotation of the face of up to 20°. Moreover, there is also some variation in the scale of up to 10%. The images are all grayscale and normalized to a resolution 92 × 112 pixels. Figure 1 shows ten images of one person.

Fig. 1
figure 1

Sample images of one person on ORL database

5.1.1.2 FERET face database

The FERET face database was sponsored by the US Department of Defense through the DARPA Program. It has become a standard database for testing and evaluating face recognition algorithms. We performed algorithms on a subset of the FERET database. The subset is composed of 1,400 images of 200 individuals, and each individual has seven images. It involves variations in face expression, pose, and illumination. In the experiment, the facial portion of the original image was cropped based on the location of eyes and mouth. Then we resized the cropped images to 80 × 80 pixels. Seven sample images of one individual are show in Fig. 2.

Fig. 2
figure 2

Sample images of one person on FERET database

5.1.1.3 Yale face database

The Yale face database was constructed at the Yale Center for Computational Vision and Control. It contains 165 gray-scale images of 15 individuals, and each person has 11 different images under various facial expressions (normal, happy, sad, sleepy, surprised, and wink), lighting conditions (left-light, center-light, right-light), and with/without glasses. In the experiment, each image was manually cropped and resized to 100 × 80 pixels. Figure 3 shows sample images of one person.

Fig. 3
figure 3

Sample images of one person on YALE database

5.1.2 Experimental results

On ORL database, three images/people are randomly selected for training and the remaining images are used for testing. Features are extracted separately by NLDA, LDA/GSVD, and MNLDA. Then the nearest neighbor classifier with Euclidean distance is applied to classify the new features. Experiment is repeated for 10 times. Figure 4 shows the recognition rates versus the 10 different training sets. Table 1 lists the average recognition rates and standard deviations across 10 runs. Table 2 lists the computational time of each method.

Fig. 4
figure 4

The recognition rates of MNLDA, LDA/GSVD, and NLDA versus different training sets on ORL database when three images/people are randomly used for training

Table 1 The average recognition rates (%) and the corresponding standard deviation of MNLDA, LDA/GSVD, and NLDA on ORL database across 10 runs when 3 images/individual are randomly used for training
Table 2 The computational time (s) of MNLDA, LDA/GSVD, and NLDA on ORL database across 10 runs when three images/individual are randomly used for training (CPU: pentium 1.8 GHz RAM: 1 Gb)

Figure 4 shows that MNLDA has the same recognition rates as LDA/GSVD consistently and that NLDA outperforms MNLDA and LDA/GSVD irrespective of the variation of training sets. From Table 1, we can see that the average recognition rates of MNLDA and LDA/GSVD are only 86.1% while NLDA reaches 90.5%. From Table 2, it can be seen that NLDA and MNLDA are more efficient than LDA/GSVD. The average computational time of LDA/GSVD is about two hundred times as much as that of NLDA.

On FERET database, we select the first α (α varies from 2 to 5) images/people for training and the rest images for testing. After feature extraction by NLDA, LDA/GSVD, and MNLDA, the nearest neighbor classifier with Euclidean distance is employed for classification. Figure 5 illustrates the recognition rates versus training sample size. The computational time of each method is shown in Table 3.

Fig. 5
figure 5

The recognition rates of MNLDA, LDA/GSVD, and NLDA versus the training sample size on FERET database

Table 3 The computational time (s) of MNLDA, LDA/GSVD, and NLDA on FERET database when the first α (α varies from 2 to 5) images are used for training (CPU: pentium 1.8 GHz RAM: 1 Gb)

From Fig. 5, we can also see that MNLDA and LDA/GSVD have the identical recognition rates and NLDA has a higher recognition rates than them irrespective of the training sample size. From Table 3, it can be seen that MNLDA and NLDA are more efficient than LDA/GSVD obviously.

On Yale database, the first β (β varies from 2 to 7) images/people are selected to form the training set and the remaining 11 − β images are used for testing. NLDA, LDA/GSVD, and MNLDA are used for feature extraction separately. After feature extraction, the nearest neighbor classifier with Euclidean distance is used for classification. Figure 6 shows the recognition rates versus training sample size. The computational time of each method is listed in Table 4.

Fig. 6
figure 6

The recognition rates of MNLDA, LDA/GSVD, and NLDA versus the training sample size on Yale database

Table 4 The computational time (s) of MNLDA, LDA/GSVD, and NLDA on Yale database when the first β (β varies from 2 to 7) images are used for training (CPU: pentium 1.8 GHz RAM: 1 Gb)

From Table 4, we can see that MNLDA and NLDA have higher efficiency than LDA/GSVD. Figure 6 shows that the recognition rates of MNLDA are the same as LDA/GSVD irrespective of the variation in training sample size. These two points are consistent with the experimental results on ORL and FERET databases. However, from Fig. 6 we can also find an inconsistent point that NLDA outperforms MNLDA and LDA/GSVD on Yale database.

What is the reason of the results that NLDA outperforms MNLDA on ORL and FERET databases and MNLDA outperforms NLDA on Yale database? We suspect it can be explained as follows: From Sect. 4, we know that it is only the normalization process after the diagonalization of S t that leads the different discriminating power of MNLDA and NLDA. In paper [34], the authors point out that the principal Eigen vectors of S t contain two components: intrinsic difference (\( \tilde{I} \)) that discriminates different face identity and transformation difference (\( \tilde{T} \)) arising from all kinds of transformations such as expression and illumination. On ORL and FERET databases, \( \tilde{I} \) may be contained on the principal Eigen vectors corresponding to large Eigen values while \( \tilde{T} \) corresponding to small Eigen values. The normalization process after the diagonalization of S t will enlarge the effect of \( \tilde{T} \) and reduce the effect of \( \tilde{I} \) which is important for recognition. So MNLDA involving normalization process performs worse. Inversely, on Yale database, \( \tilde{I} \) corresponds to small Eigen values while \( \tilde{T} \) large. So the normalization process will enlarge the effect of \( \tilde{I} \) and reduce the effect of \( \tilde{T} \). Hence MNLDA performs better.

5.2 Finger-knuckle-print recognition experiment on the PolyU FKP database

Finger-knuckle-print (FKP) images on the PolyU FKP database were collected from 165 volunteers, including 125 men and 40 women. Among them, 143 subjects were 20–30 years old and the others were 30–50 years old. The samples were collected in two separate sessions. In each session, the subject was asked to provide 6 images for each of the left index finger, the left middle finger, the right index finger, and the right middle finger. Therefore, 48 images from 4 fingers were collected from each subject. In total, the database contains 7,920 images from 660 different fingers. The average time interval between the first and the second sessions was about 25 days. The maximum and minimum intervals were 96 and 14 days, respectively. The images were processed by ROI extraction algorithm described in [35]. In the experiment, we select 1200 FKP images of the right index finger of 100 subjects and these images were resized to 55 × 110 pixels. Figure 7 shows 12 sample images of one right index finger.

Fig. 7
figure 7

Sample images of one right index finger

The first six images collected in the first session are used for training and the rest six collected in the second session for testing. Firstly, LDA/GSVD and MNLDA are performed for feature extraction. Secondly, the nearest neighbor classifier is performed for classification. Table 5 lists the recognition rates and the computational time of two methods. Table 5 shows that MNLDA is much more efficient than LDA/GSVD. However, from Table 5, it can be seen that LDA/GSVD and MNLDA obtain different recognition rates. This is not consistent with the results received from face recognition experiments. Why this happens? From the analysis above, we know LDA/GSVD and MNLDA have the same discriminating capability when condition (14) is satisfied. The number of training samples is 600 in this experiment. We find that rank(S t ) = 593 ≠ 600 − 1, namely condition (14) is not satisfied. Therefore, the equivalence of LDA/GSVD and MNLDA does not hold.

Table 5 The recognition rates (%) and computational time (s) of MNLDA and LDA/GSVD on the PolyU FKP database when the first six FKP images are used for training (CPU: pentium 1.8 GHz RAM: 1 Gb)

6 Conclusion

In this paper, we analyze NLDA and LDA/GSVD algorithms and show their relationship under a condition which is often satisfied for high-dimensional data. Based on the relationship, MNLDA algorithm which is equivalent to LDA/GSVD is proposed. When the given condition is satisfied, MNLDA has the same discriminating power as LDA/GSVD and is more efficient. We also compare the efficiency and discriminating power of NLDA and LDA/GSVD. Like MNLDA, NLDA is more efficient than LDA/GSVD. Face recognition experiments demonstrate the equivalence of LDA/GSVD and MNLDA under the given condition and MNLDA is more efficient. FKP recognition experiment shows that these two algorithms have different discriminating power when the given condition does not hold. Moreover, the different discriminating power of NLDA and MNLDA mainly depends on the characteristic of the data. We give our interpretation about this through face recognition experiments.