1 Introduction

Palmprint recognition has been widely studied and has become one of the hottest biometrics technologies [17]. As the palmprint image contains very rich texture information, these methods on palmprint recognition can achieve a very high accuracy [810].

A palmprint recognition system should implement the following procedures: palmprint image capture, feature extraction and palmprint classification. Different literatures have proposed various palmprint recognition methods. For example, the appearance-based method uses the whole palmprint image to produce its holistic features for palmprint recognition [1119]. In the appearance-based method, the whole region of interest (ROI) image of the palmprint image is used to act as the input. To extract features from one sample, the feature extraction procedure should run only one time. Typical appearance-based palmprint recognition methods include the Eigenpalm [1215], two-dimensional principal component analysis (2DPCA) [16, 20], two-dimensional locality preserving projections [21], Fisherpalm [22] and two-dimensional linear discriminant analysis (2DLDA) [2325]. Besides the conventional appearance-based methods mentioned above, a special appearance-based method named representation-based method has also been used for palmprint recognition. This method also treats the whole image as the input and extracts holistic features of the sample. For example, Xu et al. [26] have exploited a linear combination of all the training samples to represent the test sample and used the representation result to perform palmprint recognition. As the representation-based method can obtain higher classification accuracy than conventional appearance-based methods and attract increasing attention [2629].

The representation based methods can be categorized into two kinds: the L 2 norm based representation method (2NBRM) and the L 1 norm based representation method (1NBRM). The representation methods proposed in [2731] are typical examples of 2NBRM. The representation methods proposed by Wright et al. [32, 33] are typical examples of 1NBRM. Compared with 1NBRM, 2NBRM has remarkable advantages on computational efficiency since 2NBRM does not need to iteratively compute the solution of a linear system. And, 2NBRM obtains a higher accuracy than 1NBRM. For example, Zhang et al. [27] and Shi et al. [28] illustrated that 2NBRM outperforms the well-know 1NBRMs proposed in [32, 33] in classification accuracy. Moreover, the L 2 norm based two-phases test sample sparse representation method (TPTSSR) proposed in [29] achieved an excellent performance in face recognition. Hereafter sparse representation means that a linear combination of all the training samples is used to represent the test sample. TPTSSR uses an elaborate scheme to determine the training samples that are best suited to represent the test sample and depends on a weighted sum of these training samples to classify the test sample. The success of 2NBRM also help people understand the following fact: for representation based method, the classification strategy i.e. the strategy to exploit the contribution in representing the test sample of different classes for classification rather than the use of L 1 or L 2, norm plays the most important role. Moreover, to use the L 1 norm is not the sole way to obtain the sparsity. Actually, a proper use of the L 2 norm can also lead to sparse representation [29]. Previous literature has also shown that the key to the representation method seems to be to make the genuine class have the maximum contribution in representing the test sample. The representation based method can also be viewed as a method in which all the training samples try to provide a good representation for the test sample in a competitive way [34].

Various feature-based palmprint recognition methods which depend on the line features or coding features extracted from local regions of the palmprint image have also been used for palmprint recognition [3542]. For example, the ordinal feature [43], principle line [44] and Gabor feature [45] of the palmprint have been used for palmprint authentication. Among feature-based palmprint recognition methods, the competitive coding method achieves a very good performance [40, 41]. The scheme integrating the appearance-based method and feature-based method has also been proposed in [46].

We see that conventional appearance-based palmprint recognition methods are easy to implement but usually do not lead to a very high accuracy. However, feature-based palmprint recognition methods can obtain a satisfactory accuracy but have a high computational cost. Compared with these two kinds of methods, the representation-based palmprint recognition method can obtain a better performance. Especially, 2NBRM is also computationally very efficient.

With this paper we propose a novel representation-based palmprint recognition method. The proposed method belongs to norm-based method and depends on an approximate representation of the test sample to classify it. This method first assumes that a weighted sum of all the training samples can stand for the test sample and obtains the weight coefficients by solving a linear system. It then takes the sum of all the training samples weighted by the weight coefficients as the approximate representation of the test sample. The method again expresses the approximate representation as a weighted sum of all the training samples. The method respectively calculates the reconstruction error of the approximate representation from the weighted sum of the training samples from each class. The method also computes the normalized distance between the test sample and each class. Finally, the method combines the reconstruction error and normalized distance between the test sample and a class to classify the test sample.

The method proposed is simple and computationally efficient and has the following rationales: the original test sample data might contain noise, so it is also not necessary to accurately represent it. As a result, it is a good way to decompose the test sample into two components, i.e. the interpretable component (the approximate representation) and remainder component and to exploit only the interpretable component to perform classification of the test sample. This will allow the training samples from different classes to more “freely” compete in representing the test sample and will enable the effect, on the representation of the test sample, of different classes to be evaluated more precisely. On the contrary, since the original representation-based method tries its best to interpret the test sample as a linear combination of all the training samples, its performance in classification will be affected by the noise in the test sample.

From the viewpoint of numerical analysis as shown in Sect. 3, the weight vector corresponding to the approximate representation also has a smaller norm, so it is numerically more stable and can perform better in classification of the test sample. Moreover, the distance between the test sample and each class provides another kind of matching scores between the test sample and a class, so the fusion of the reconstruction error and distance can lead to higher classification accuracy. The experimental results show that the proposed method outperforms the state-of-art palmprint recognition methods in classification accuracy.

2 Main steps of fusion method based on reconstruction error and normalized distance

The fusion method based on reconstruction error and distance includes the following main steps. The first step obtains an approximate representation of the test sample. It achieves this by solving a linear system. This step also records the approximate representation of each test sample. Then the second step obtains a linear combination of all the training samples that best approximates the approximate representation of the test sample. Finally, the third step evaluates the contribution of the training samples of each class in representing the approximate representation. After the distance between the test sample and each class is calculated, this distance is combined with the reconstruction error between the approximate representation and the contribution of each class to classify the test sample.

Suppose that there are C classes and each class provides n training samples. Let \(x_{1}^{i},\ldots,x_{n}^{i}\) denote n training samples from the i-th class (i=1,…,C). The algorithm’s steps are described as follows:

Step 1. Let y denote the test sample. Use \(y = \frac{y}{\|y\|}\) and \(x_{j}^{i} = \frac{x_{j}^{i}}{\|x_{j}^{i}\|}\) to convert the test sample and training samples into unit vectors. Hereafter all the norms are the 2 norm. Suppose that \(y \approx\sum_{i = 1}^{C} \sum_{j = 1}^{n} x_{j}^{i} z_{j}^{i} = XZ\), \(X = [x_{1}^{1}, \ldots,x_{n}^{1}, \ldots,x_{1}^{C}, \ldots,x_{n}^{C}]\), Z=[z 1,z 2,…,z nC ]T can be approximately satisfied. The objective is to minimize both ∥yXZ2 and ∥Z2. According to the Lagrange algorithm, Z should be computed using

$$ Z = \bigl(X^{T}X + \sigma I\bigr)^{ - 1}X^{T}y $$
(1)

where I and σ denote the identity matrix and a small positive constant, respectively. We take XZ as approximate representation of test sample y and denote it by y a .

Step 2. For test sample y, we assume that we can approximately obtain y a XP, \(P = [p_{1}^{1}, \ldots,p_{n}^{1}, \ldots,p_{1}^{C}, \ldots, p_{n}^{C}]^{T}\). Let the objective function be min(∥y a XP2+σP2). Then P is computed using

$$ P = \bigl(X^{T}X + \sigma I\bigr)^{ - 1}X^{T}y_{a} $$
(2)

Step 3. Use the following equation to evaluate the contribution of the training samples of the i-th class in representing the approximate representation

$$ \mathit{con}_{i} = \sum_{j = 1}^{n} p_{j}^{i}x_{j}^{i} $$
(3)

where \(p_{j}^{i}\) is the entry of P. It is clear that both P and X have nC entries and \(p_{j}^{i}\) acts as the weighted coefficient of \(x_{j}^{i}\). We calculate the deviation between the approximate representation of the test sample and the contribution of the training samples of the i-th class

$$ \mathit{devia}_{i} = \Vert y_{a} - \mathit{con}_{i}\Vert $$
(4)

For test sample y, devia i =∥y a con i ∥ is normalized using

$$ \mathit{devia}_{i}' = \frac{\mathit{devia}_{i} - \mathit{devia}_{\min}}{\mathit{devia}_{\max} - \mathit{devia}_{\min}} $$
(5)

where devia max, devia min denote the maximum and minimum reconstruction errors, respectively.

Step 4. We compute the distance between the test sample and the original training samples of the i-th class using

$$ \mathit{dsit}_{i} = \frac{1}{n}\sum _{j = 1}^{n} \bigl\|y - x_{j}^{i}\bigr\| $$
(6)

For test sample y, dsit i is normalized using

$$ \mathit{dist}_{i}' = \frac{\mathit{dsit}_{i} - \mathit{dsit}_{\min}}{\mathit{dsit}_{\max} - \mathit{dsit}_{\min}} $$
(7)

where dist max, dist min denote the maximum and minimum distances, respectively.

Step 5. For test sample y, we calculate the matching score of the test sample with respect to training samples of the i-th class using

$$ \mathit{score}_{i} = a \times \mathit{devia}_{i}' + b \times \mathit{dist}_{i}' $$
(8)

where a and b are the weights of the reconstruction error and distance, respectively.

We consider that a low reconstruction error means that it is accurately reconstructed. If h=argmin i score i , we will assign test sample y into the h-th class. In other words, we assign the test sample into the class with the smallest deviation.

The above steps are not terminated until the classification of all the test samples has been performed. We summarize the main steps of the proposed method as Table 1.

Table 1 The proposed algorithm

3 Potential rationale of our method

All the training samples are usually not able to provide accurate representation of the test sample, this is because the number of the training samples are usually less than the dimensionality and they cannot span a linear space.

The first step of our method has the following effect: it can somewhat discard the “noise” embedded in the test sample. The “noise” might consist of two parts. The first part is the conventional noise that is generated from the imaging process, e.g. electromagnetic noise. The second part is the difference between the test sample and the training samples from the same subject, which is mainly caused by varying illumination, pose and facial expression. Moreover, the approximate representation is also the nearest to the test sample among the “possible” representation generated from the training samples. Since the approximate representation is obtained using the criterion of minimizing the l 2 norm of the coefficient vector, it will be the nearest to the test sample in comparison with the approximate representation obtained using other criterions. On the other hand, the “noise” indeed means the information that cannot be represented by the training samples. It is probably that the “noise” is different from the training samples in data construction. After the “noise” is discarded, our method will not be affected by the “noise” and will be easier to determine the class that makes the most contribution to representing the test sample. Figure 1 shows the illustration of the accountable component y a and unaccountable component ε of test sample y. In this figure, ε denotes the “noise”.

Fig. 1
figure 1

The illustration of the accountable component y a and unaccountable component ε of test sample y

It can be seen from Fig. 1 that y=y α +ε. We assume that the probability of the test sample y, being from class c i (i=1,…,C), is directly related to ∥ycon i ∥, that is

$$ p(c_{i}|y) \propto D_{\max } - \| y - \mathit{con}_{i}\| $$
(9)

where D max stands for a number greater than the maximum value of ∥ycon i ∥. A smaller ∥ycon i ∥ means a greater posterior probability p(c i |y). The classification rule of the proposed method indeed classifies the test sample into the class that has the greatest posterior probability.

(10)

Based on this, the approximate representation y α of the test sample y has greater posterior probability than that of the global representation y. Therefore, the weight vector of the approximate representation is always smaller than that of the global representation.

Figure 2 shows four original 2D palmprint test samples from the 2D+3D palmprint database (please refer to Sect. 4) and their approximate representations. Figure 3 shows four original 3D palmprint test samples from the 2D+3D palmprint database and their approximate representations. These approximate representations are obtained under the condition that the first five 3D ROI and 2D ROI images collected per palm in the first session are respectively used as training samples and the first 3D ROI and 2D ROI images per palm collected in the second session are respectively taken as testing samples.

Fig. 2
figure 2

Four original 2D palmprint test samples from the 2D+3D palmprint database and their approximate representations. The four original 2D palmprint test samples are shown in the first row and their approximate representations are shown in the second row

Fig. 3
figure 3

Four original 3D palmprint test samples from the 2D+3D palmprint database and their approximate representations. The four original 3D palmprint test samples are shown in the first row and their approximate representations are shown in the second row

We refer to the weight coefficients of the representation-based classification method as weight vector. The weight vector is indeed the solution of the linear system. It is clear that when linear system has a solution with a small norm, the solution is numerically stable and can generalize well. Figure 4 shows the norms of the weight vectors of the global representation and approximate representation. This figure illustrates that the norm of the weight vector of the approximate representation is always smaller than that of the global representation. As a result, the solution of the approximate representation is always numerically more stable than that of the global representation. This, of course, will be helpful for exploiting the solution to recognize the palmprint. Figures 5, 6, 7, 8 show the weight coefficients, of the first test sample, generated from the global representation and approximate representation on the 2D palmprint image from the 2D+3D palmprint database.

Fig. 4
figure 4

The norms of the weight vectors of the global representation and approximate representation

Fig. 5
figure 5

The weight coefficients, of the first test sample, generated from the global representation. The first three 2D ROI images collected in the first session are used as training samples

Fig. 6
figure 6

The weight coefficients, of the first test sample, generated from the approximate representation. The first three 2D ROI images collected in the first session are used as training samples

Fig. 7
figure 7

The weight coefficients, of the first test sample, generated from the global representation. The first six 2D ROI images collected in the first session are used as training samples

Fig. 8
figure 8

The weight coefficients, of the first test sample, generated from the approximate representation. The first six 2D ROI images collected in the first session are used as training samples

4 Experiments

In this section, to evaluate the proposed method, extensive experiments are designed to demonstrate the effectiveness of the proposed method on different palmprint databases. For the sake of completeness, we compare the performance of the global method [29], the method based on reconstruction error, the method based on distance, the algorithms proposed in [7, 26] and our algorithm. In addition, the optimal values are chosen for the parameters a and b in formula (8) in the experiments.

4.1 Experiments on the PolyU palmprint database

The Hong Kong PolyU palmprint database consists of 7752 images captured from a total of 386 palms of 193 individuals (http://www.comp.polyu.edu.hk/~biometrics). The samples were collected in two sessions where the average interval between the two sessions was around two months. We selected 4246 images of 386 different palms (11 samples per palm). All images are cropped to 32×32 pixels.

We respectively used the first five, six, and seven images as training set and the rest as testing set. It can be seen from Table 2 that the recognition rates of the proposed method vary slightly along with the change of the parameters a and b (a>b). Moreover, the proposed method obtains better recognition performance.

Table 2 The error rate for various methods on PolyU palmprint database

4.2 Experimental result on the 2D+3D palmprint database

We also used a 2D+3D palmprint database to perform experiments. This database contains 8000 palmprint samples collected from 400 different palms [37, 38]. Twenty samples from each of these palms were collected in two separated sessions, where 10 samples were captured in each session, respectively. The average time interval between the two sessions is one month. Each sample contains a 3D ROI (region of interest) and its corresponding 2D ROI. All images are cropped to 32×32 pixels.

We separately used the first three, four, five, and six 3D ROI and 2D ROI images collected in the first session as training samples and took the first five 3D ROI and 2D ROI images collected in the second session as test samples. Table 3 shows the error rates of various methods. From Table 3 we can see that the proposed method has the best performance.

Table 3 The error rate for various methods on 2D+3D palmprint database

4.3 Experimental result on corrupted palmprint images

In this experiment the 2D+3D palmprint database shown in Sect. 4.2 is also used. We first exploited Matlab function “imnoise” to cause Gaussian corruption for the palmprint images. The parameters mean and variance of the Matlab Gaussian noise were set to zero and 0.01, respectively.

Table 4 illustrates the error rates of each method. From Table 4, we can see three main points. First, the proposed algorithm outperforms other algorithms. Second, the proposed algorithm is more robust than other algorithms under a noised condition. From the first and the second points, we can see that the proposed approach can indeed improve the palmprint recognition accuracy.

Table 4 The error rate for various methods on corrupted palmprint database

5 Conclusions

As a representation-based classification method, the proposed method factorizes the test sample datum into two components, the interpretable component and remainder component. The interpretable component reflects the maximum capability of all the training samples in representing the test sample and can be viewed as the composition of the test sample that can be explained, whereas the remainder component is the composition of the test sample that cannot be interpreted by the training samples and might mainly consisting of the “noise” in the test sample data. By neglecting the remainder component of the test sample and using the approximate representation to stand for the test sample, the proposed method makes the training samples from different classes more “freely” to compete in representing the test sample. At the same time, the distance between the test sample and each class provides a novel matching scores between the test sample and each class. Experimental results indicate the effectiveness of the proposed method.