Keywords

1 Introduction

At present, vein recognition, as a kind of biometric authentication technology with high anti-counterfeiting, has attracted much attention. Compared with traditional biometrics such as fingerprints, faces, and irises, vein recognition has two significant advantages: internal features and liveness detection [1]. Finger vein recognition has been applied to ATM machines, access control systems, vending machines, and various login products in some countries or regions [2]. At the same time, it is difficult to capture high-quality finger vein images during the acquisition of finger vein images, which is still a great challenge for the subsequent recognition process. Due to unique advantages, broad application scenarios and existing challenges, finger vein recognition has attracted more and more researchers’ attention.

Traditional finger vein recognition systems mostly use hand-craft features, which are generally sensitive to image quality and finger attitude changes; besides, the pre-processing process is too complicated and the final performance is still limited of the system. To overcome the shortcomings of this method, a few researchers have proposed the CNN-based finger vein recognition, which can automatically learn and extract features with stronger distinguishing ability from the original region of interest (ROI) image. There is no need to filter and enhance the ROI image and the pre-process of image is greatly simplified with CNN. Nevertheless, existing CNN-based finger vein recognition methods mostly adopt such large networks [3] or complex step-by-step processes [4] that they cannot be applied to the hardware platform with limited computing power and small memory.

Considering the weakness of existing CNN-based finger vein recognition methods, this paper introduces a finger vein verification system using cascade fine-tuning CNN, which achieves not only high recognition accuracy but also the simplification of network.

2 Related Works

In recent years, a few researchers have used CNN for finger vein recognition [4,5,6,7,8,9,10,11]. At present, CNN-based finger vein recognition can be roughly classified into the following categories: (1) Finger vein recognition is regarded as a multiclass classification problem [6,7,8]. In [6], they proposed a reduced complexity CNN with four convolutional layers for finger vein recognition. In [7], using CNN of the same structure as the one used in [6], they retrain the fully connected layer regardless of the pre-training weights. However, these types of methods use finger-vein images of the same class both for training and testing, and for this reason, they only work for 1:N recognition scene on some fixed small databases because they cannot recognize the finger-vein images of classes that have not been trained. (2) CNN is trained to extract the vein structure of a single image and template matching is applied to the obtained binary vein pattern image for verification [9]. This method can identify classes that have not been trained, but the vein pattern extraction and matching process are carried out step by step and the traditional algorithm is still used in the matching with this method. (3) A pair of homologous or heterogeneous images is treated as a single sample whose label is one or zero [4, 5, 10]. In [4], difference image of the image pair is taken as the input of CNN to fine-tune VGGNet-16 [12]. In [5], they design a two-channel network and a two-stream network with a two-channel image as the CNN’s input, but in this paper, some images of the same finger both are used for training and others are used for testing in the experiment and there is no definite conclusion when finger images that have not been trained are tested. In addition to the above, the literature [11] proposes a template image generation model of finger vein based on deep learning and random mapping according to the practical application. Of course, some of the CNN techniques in the above have also achieved good results in the fields of face recognition and image matching [13,14,15].

Therefore, how to train a finger vein recognition model with high precision and small model size is the focus of this paper, when there are only limited finger vein images. The main contributions of this paper are as follows:

  1. 1.

    This paper is the first to use 3C image as the input of CNN for finger vein verification, which not only make it possible to fine-tune the ImageNet pre-training model but also make full use of finger vein images. In addition, this paper compares the performances of the same network with difference image, 2C image and 3C image as the input, and makes a conclusion that the network performs best with 3C image as the input.

  2. 2.

    Considering that the finger vein recognition algorithm is often embedded in a small portable device, people maybe tend to the network with small size compared with that method using VGGNet-16. For this reason, this paper fine-tunes the light-weight SqueezeNet [16] with 3C image as the input and analyses the performance of SqueezeNet when we fine-tune it in different ways.

  3. 3.

    In order to make full use of the difference and self-information of the image pair to recognize and simultaneously fine-tune the light-weight SqueezeNet, this paper designs a cascade fine-tune framework to train difference images and 3C images hierarchically, further improving the performance of network.

The remainder of the paper is organized as follows: Sect. 3 introduces the proposed method, including how to obtain the 3C image, make use of SqueezeNet, design the cascade fine-tune framework for finger vein recognition. Section 4 describes our experiments as well as analysis on the experimental results. Finally we present conclusions in Sect. 5.

3 Proposed Method

3.1 Acquirement of the 3C Image

There are two methods of a CNN structure’s identifying finger vein images of the untrained classes. (1) Using the trained CNN network to extract features from the original single finger vein image, Fig. 1(a) shows this flowchart. (2) Considering a pair of images to be verified as one sample. They use the difference image that is obtained with the difference operation between the input and enrolled image as the input to the CNN in [4], which is shown in Fig. 1(b). They use the 2C image that is obtained with the channel connection between the input and enrolled image as the input to the CNN in [5], which is shown in Fig. 1(c). If we try to fine-tune the ImageNet pre-training model with the difference image as the input, we have to copy the channel of the difference image. It is impossible to fine-tune the ImageNet pre-training model with the 2C image as the input because 2C image is a two-channel image. In addition, copying channel of difference images is only the simple repetition of difference information and does not fully utilize the information about the image pair themselves. In order to fine-tune the ImageNet pre-training model and avoid copying the channel of the difference image, this paper combines the difference image with the 2C image in channel to acquire the 3C image, which is uses as the input of CNN. This flowchart is shown Fig. 1(d).

Fig. 1.
figure 1

The CNN structure with the finger vein image as the input

3.2 SqueezeNet-Based Finger Vein Recognition

In Fig. 2, the input of the network is a 3C image, and the output of the network is the category, that is, authentic matching or imposter matching. The SqueezeNet employed in this paper starts with the convolution layer (conv1), then uses 8 Fire Modules (fire2-fire9), and finally ends with the convolution layer (conv10). The number of filters in each Fire Module is gradually increased, and the layers conv1, fire3, and fire5 are all followed by the max-pooling with the step size of two. Like face recognition, finger vein recognition also consists of the 1:1 verification and the 1:N recognition. When the accuracy of 1:1 verification is high enough, the verification method will be applied to the 1:N recognition. Our method is aimed at solving the 1:1 verification problem that when there is a pair of images, the trained network can output the probability of this image pair’s being authentic matching or imposter matching. (The probability of this image pair’s being authentic matching is regarded as the similarity between them in this paper.)

Fig. 2.
figure 2

Finger vein recognition network based on SqueezeNet

3.3 Cascaded Fine-Tune Framework Based on Difference Image and 3C Image

Considering a pair of images to be verified as one sample, and the difference image and 3C image can be seen as the various representations of the sample respectively: the difference image focus on describing the difference between the image pair corresponding to the sample; while, the 3C image not only retains the self-information of the image pair but also simply describes the difference between the image pair. In order to make full use of the difference between the image pair and self-information of them, and fine-tune the SqueezeNet at the same time, a cascade fine-tune framework based on the difference image and 3C image is proposed in this paper, which is shown in Fig. 3. Cascade fine-tune framework is a common fusion framework of network. In specific implementations, we fine-tune the pre-trained SqueezeNet with the difference image as the input for the first time, gaining the first optimized model, and then we fine-tune the first optimized model with the 3C image as the input for the second time, obtaining the second optimized model.

Fig. 3.
figure 3

Cascaded fine-tune framework for finger vein recognition

4 Experiments

4.1 Experimental Environment and Data

The MMCBNU_6000 dataset has a total of 6000 images consisting of 100 people with 2 hands and 3 fingers, and 10 images per each finger. This dataset provides the original finger vein images whose resolution is 640 × 480 and corresponding ROI regional images whose resolution is 128 × 60. In order to compare with other algorithms conveniently and fairly, we directly employ the ROI images of this dataset.

The SDUMLA-HMT dataset has a total of 3816 images consisting of 106 people with 2 hands and 3 fingers, and 6 images per each finger. All finger vein images in the dataset are gray-scale images with the resolution of 320 × 240. Moreover, using the existing ROI extraction algorithm [18], we obtain the corresponding ROI images which are normalized to the size of 128 × 60.

4.2 The Performance of Fine-Tuning Different Pre-Trained Models

The experimental results are shown in Table 1. C-1 is the method in reference [4] and the input of this method is replaced by the 3C image in this paper, namely, the C-2 method It can be seen that the EER of C-2 method is lower than that of C-1 method on both datasets, which also shows that when fine-tuning the same network, inputting 3C image is better than inputting difference image. Then, this paper takes 3C image as input to fine-tune the SqueezeNet. In the process of transfer learning, there is no fixed choice that which layer weight of the pre-training model is used. In order to make the best transfer learning on two datasets, this paper further carries out experiments: the pre-training weights of SqueezeNet’s pre-fire9, pre-fire8, pre-fire7 and pre-fire6 are taken as the initial weights respectively, and the weights of the other layers are randomly initialized. Then, the 3C images are used as input to fine-tune the SqueezeNet on the two datasets, respectively. The experimental results are shown in Table 2. On MMCBNU_6000, the EER of D-3 method is the lowest, while that of D-4 method is the lowest on SDUMLA-HMT. Thus, fine-tuning the same network is different on different datasets. Based on the previous experiments, the cascade fine-tune framework is tried in the experiment. The experimental results are shown in Table 3. This result is compared with those above. As shown in Fig. 4, the EER of this method is lower than that of the above on both datasets. So cascade fine-tuning the SqueezeNet based on difference images and 3C images will have the best performance.

Table 1. EER of fine-tuning VGGNet-16 with different inputs
Table 2. EER of fine-tuning SqueezeNet with different ways (Here fine-tune means that pre-trained weights of some layers are as initial weights)
Table 3. EER based on cascaded fine-tune framework
Fig. 4.
figure 4

EER of fine-tuning various pre-training models

4.3 Comparisons with Other Methods

Comparing the best E method with other existing algorithms, the results of comparing our E method with other finger vein recognition methods based on CNN are shown in Table 4. In Table 4, we can see that the proposed method has the best performance on MMCBNU_6000 and a little worse performance on SDUMLA-HMT than other methods. As mentioned above, image quality and ROI extraction process will affect subsequent recognition. In addition, the size of the model proposed in this paper is only 5.63 MB, which is much smaller than that of other methods. Considering the comprehensive performance and practical value, the proposed method is superior to other algorithms.

Table 4. Compare our method with other CNN methods in comprehensive performance

5 Conclusion

In this paper, 3C image is first proposed as CNN’s input and we fine-tune the pre-trained SqueezeNet. Furthermore, a cascade fine-tune framework of network based on difference image and 3C image is proposed to improve the recognition accuracy of finger vein recognition. Firstly, the finger vein feature expression ability of three pre-training models is validated on two datasets. Secondly, the performance of the same network under different inputs is compared and the experimental results show that the 3C image has a good effect when it is used as the input of network. Finally, fine-tune is implemented in different ways based on SqueezeNet, and the performance of the model is further improved by using cascade fine-tune framework. Compared with other traditional methods, this method is obviously superior to the traditional feature extraction method. Compared with other CNN methods, this method has good comprehensive performance and high recognition accuracy. How to further improve the recognition accuracy on low-quality image datasets is still a challenging problem on the premise of smaller model. The next step is to consider further improving the recognition accuracy by data enhancement and more refined ROI extraction.