1 Introduction

Finger vein recognition is a new biometric technology which has been widely used in military and financial fields since the characteristics in the finger are difficult to be replicated [1]. However, considering the environmental factors (i.e., dust attached to the mirror) or human factors (i.e., fingers with skin cracks) in practical application, the obtained finger vein images have a significant noise added to the clean image which has been shown in Fig. 1. Such noises will seriously affect the recognition performance of finger vein images, making finger vein de-noising algorithm a challenging task.

Fig. 1
figure 1

The noise of finger vein image in actual use: dust adhering to mirror, finger skin crack

The traditional image de-noising algorithms are mainly divided into two categories, transform domain de-noising methods [2,3,4] and spatial domain de-noising methods [5,6,7]. The transform domain de-noising methods project the dusty image to the transform domain for de-noising, and then collect the clean image through inverse transformation. The process is mainly based on the distribution difference between the noise image and the clean image in the transformation domain [2]. Li et al. [3] proposed a de-noising model based on group sparse coding theory, where the image was transformed into sparse domain with rank minimization as the objective in order to reproduce the clean image. Su et al. [4] proposed a de-noising model based on low rank matrix approximation to solve the model with multi-frequency weighted \(l^p\)-norm minimization, which achieved the good performance by eliminating the Gaussian noise and actual noise. There also provides desirable solution for image de-noising [5], which convolutes the image to de-noising. Yang et al. [6] proposed a de-noising algorithm based on non-local mean filtering and partial differential equation constraint. The image is restored according to the similarity between the region around the noise and its external region, and the fourth-order partial differential equation has been used to constrain the restoration process. Thanh et al. [7] proposed a de-noising algorithm with adaptive switching weighted mean filter, which adaptively adjusts the parameters of the filter kernel according to the distribution of dusty points and clean points in the filtering window. Traditional filtering de-noising algorithms mainly extract noise features based on empirical settings, with relatively low generalization ability. Specifically, due to the complex effects of dusty mirror and finger crack, the noise characteristics of finger vein are relatively complicated. It is very difficult to construct the vein noise distribution completely through the above traditional filtering de-noising algorithms.

Recently, deep learning algorithms show superior performance in computer vision tasks. Fu et al. [8] proposed dual variational generation for low-shot heterogeneous face recognition, solving the problem of the large domain discrepancy and a lack of heterogeneous data. This method is aimed at improving the recognition accuracy. Song et al. [9] proposed a novel framework based on active pseudo-labeling, to reduce the domain gaps between synthetic images and real images. Researchers have also addressed the potential effectiveness of the deep learning algorithms to solve the de-noising problem. The de-noising model based on deep learning shows better generalization performance and higher feature mining accuracy than the traditional de-noising model. Burger et al. [10] used Convolutional Neural Network (CNN) to denoise the image, and realized the transformation from noisy image to clean image by leveraging the mapping relationship between them. However, due to the insufficient use of shallow network, the de-noising algorithm based on CNN leads to the lost of high-frequency details. Tian et al. [11] proposed an attention oriented CNN de-noising model. The model increased the shallow network influence on deep network, thus improving the de-noising performance of actual noise. CNN de-noising model obtains the mapping from noisy image to clean image by stacking more convolutional layers, but when more specific details are required for de-noising, this method is difficult to meet the target [12]. Lyu et al. [13] proposed a de-noising model based on Generative Adversarial Network (GAN), which further restored vein details. GAN [14] adopts the idea of competition, and strengthens the mapping effect from noisy image to clean image through the competition between a generator network and a discriminator network.

Although the de-noising methods based on GAN have made some achievements in Peak Signal to Noise Ratio (PSNR), visual effect, and other evaluation parameters, they are all based on the image which has the clear texture. For the finger vein image with blurred texture, the de-noising performance decreases significantly. The loss function of the GAN de-noising model is the adversarial loss and the content loss. The content loss takes the gray difference between the noisy images and the denoised images as the constraint. For the finger vein image with blurred texture, it is difficult to remove the noise and restore the texture details only by the constraint based on the images gray difference. Therefore, it is necessary to design a specific de-noising model according to the texture characteristics of finger vein image.

A finger vein de-noising algorithm based on CS-TCGAN is proposed in this paper. The main contributions of this paper are as follows. Firstly, since the content loss has poor constraint effect on finger vein image with blurred texture, the texture loss has been considered in the loss function to recover more vein texture details. Secondly, the de-convolution in neural network generates images by describing the rough structure of the image and then filling in the details. However, in the process of de-convolution, up-sampling is easy to cause the checkerboard artifacts effect, which leads to the loss of vein information [15]. Thus, the dimension preserving structure is adopted in the generator network to minimize this problem. Lastly, a training dataset with salt/pepper noise as the main part, Gaussian noise, Poisson noise, and speckle noise as auxiliary are designed according to the noise density and characteristics of actual vein noise, so as to obtain a de-noising model with better generalization. Compared with the traditional filtering de-noising algorithm and the de-noising algorithm based on GAN, the proposed finger vein de-noising algorithm based on CS-TCGAN performs better in visual effect, PSNR, and recognition performance.

The remainder of the paper is organized as follows. The Sect. 2 describes the structure of the proposed network. The Sect. 3 describes the constitution of the proposed training dataset. In Sect. 4, we present our experimental results from visual effects, PSNR, and recognition performance. Finally, conclusions are given in Sect. 5.

2 De-noising Algorithm Based on CS-TCGAN

In order to avoid the problems of vein texture blurred and structure discontinuity after de-noising process, the corresponding network structure, loss function, and training dataset need to be carefully designed. Firstly, the feature dimension preserving structure is adopted for the generator [15], and a residual block is added. Secondly, we designed a loss function with an adversarial loss, a content loss, and a texture loss. Lastly, a specific training dataset is well designed by combining Poisson noise, salt/pepper noise, Gaussian noise, and speckle noise.

The CS-TCGAN proposed in this paper consists of a generator network and a discriminator network. The notation of the proposed network are presented in Fig 2. \({I_\mathrm{noise}}\) denotes the image to be denoised. \(I_\mathrm{clean}\) represents the original clean image. \(I_\mathrm{de\text {-}noise}\) denotes the image as a result of de-noising. \(I_\mathrm{c}\) represents conditional input image as the constraint of the whole network to guide the de-noising process [16]. Under the guidance of \(I_\mathrm{c}\), the image generated by the generator will tend to be the distribution of finger vein image. In this paper, we consider \({I_\mathrm{noise}}\) as the conditional input of the network (i.e., \(I_\mathrm{c}\)=\({I_\mathrm{noise}}\)). The objective of the generator network is to generate \(I_\mathrm{de\text {-}noise}\), whose distribution is close to \(I_\mathrm{clean}\), given \({I_\mathrm{noise}}\) as the input image. The discriminator network is trained to classify \(I_\mathrm{de\text {-}noise}\) to a fake label and \(I_\mathrm{clean}\) to a real label. S&P stands for the salt/pepper noise, GN stands for the Gaussian noise, PN denotes the Poisson noise, SN denotes the speckle noise, and FDK denotes the structure of feature dimension keeping in the generator network. \(l_\mathrm{adv}\) is the adversarial loss, \(l_\mathrm{cont}\) is the content loss, and \(l_\mathrm{lbp}\) represents the texture loss. The dashed lines represent the actual image transmission, the dotted lines represent the network feedback, and D_result represents the output of the discriminator network.

Fig. 2
figure 2

The structure of the proposed CS-TCGAN

2.1 Generator Network Architecture

The purpose of the finger vein image de-noising process is to project the noisy image onto a clean image space and keep the vein information as much as possible. In this paper, the generator network adopts the structure of feature dimension preserving, which ensures the vein information can be recovered successfully. Meanwhile, the input \({I_\mathrm{noise}}\) is added to the output of the last convolutional layer, and connected to the de-convolutional layer [17]. The range of receptive field in convolution is related to the results of learning [18]. In order to further enhance the recovery effect of vein details, the size of convolutional kernel varies with the number of channels, so as to learn the detailed information under different receptive fields. Detailed architecture of the proposed network is shown in Fig. 3.

Fig. 3
figure 3

Generator network architecture

2.2 Discriminator Network Architecture

The objective of the discriminator network is to classify the denoised image (input from the generator network) to a fake label “0”, and classify the original clean image to a real label “1”. The discriminator network needs to be concerned with the generator network [19]. If the discriminator network is too strong, the generator network will fail in the competition, leads to the failure of de-noising. On the other hand, if the generator network is too stronger that results in poor performance to classify the denoised image versus the clean image. There are five convolution layers in our discriminator network. The detailed discriminator network architecture is shown in Fig. 4.

Fig. 4
figure 4

Discriminator network architecture

2.3 Design of Loss Function Based on Texture Loss

The objective function of the proposed CS-TCGAN is expressed as follows [20],

$$\begin{aligned} \min \limits _{G} \max \limits _{D}V(D,G)=E_{x \sim p_{data}(x)}(\log _2D(x|c))+E_{z \sim p_{noise}(z)}(\log _2(1-D(G(z|c)))) \end{aligned}$$
(1)

where G represents the generator network, D represents the discriminator network, V(DG) is the objective function, D(x|c) denotes the probability of judging the real sample x given sample c in the discriminator network, G(z|c) represents the generated sample obtained by inputting the input sample z given c in the generator network, and \(E(*)\) represents the mathematical expectation value of the distribution function. G minimizes the objective function and D maximizes the objective function. Therefore, the loss function of CS-TCGAN mainly consists of the generation loss and the discriminator loss. The loss function of the discriminator network is shown in Eq. (2),

$$\begin{aligned} \begin{aligned}&l_\mathrm{D}=\frac{1}{N} \times \sum _{i=1}^N {S\_CE(D(I_\mathrm{clean},I_\mathrm{c}),1)+S \_ CE(D(G(I_\mathrm{noise},I_\mathrm{c}),I_\mathrm{c}),0)} \quad \quad \quad \quad \;\\&S \_ CE(logit,label)=-label \cdot \log _2(sigmoid(logit))+(1-label)\\&\quad \cdot (-\log _2(1-sigmoid(logit)) \end{aligned} \end{aligned}$$
(2)

where i represents the pixels in the image, and the sum is N. \(S \_ CE(*)\) represents cross entropy loss function, which is used to calculate cross entropy after activation of sigmoid function. logit represents the input image and label represents the classification result, where \(label\in \{0,1\}\).

For the finger vein image with blurred texture, we also need to consider the texture restoration. Thus, we introduce Local Binary Patterns (LBP) as the texture constraint. LBP is an effective texture description operator, which can measure and extract local texture information effectively.

The loss function of CS-TCGAN proposed in this paper consists of the adversarial loss, the content loss, and the texture loss, which is shown in Eq. (3),

$$\begin{aligned} \begin{aligned} l_\mathrm{G}=\lambda _\mathrm{a}l_\mathrm{adv}+\lambda _\mathrm{c}l_\mathrm{cont}+\lambda _\mathrm{l}l_\mathrm{lbp}, \quad \lambda _\mathrm{a}+\lambda _\mathrm{c}+\lambda _\mathrm{l}=1 \quad \quad \quad \quad \end{aligned} \end{aligned}$$
(3)

where \(l_\mathrm{adv}\) is the adversarial loss, \(l_\mathrm{cont}\) is the content loss, and \(l_\mathrm{lbp}\) is the texture loss. \(\lambda _\mathrm{a}\), \(\lambda _\mathrm{c}\), \(\lambda _\mathrm{l}\) are the weighting coefficients of the adversarial loss, the content loss, and the texture loss. In order to improve the effect of finger vein texture restoration and avoid the generation of false vein caused by too large weight coefficient of texture loss item, according to the results of many experiments, this paper sets roughly \(\lambda _\mathrm{a}\)=0.3, \(\lambda _\mathrm{c}\)=0.5, and \(\lambda _\mathrm{l}\)=0.2.

The adversarial loss can be expressed as the probability that the denoised image is considered clean by the discriminator network, which is shown in Eq. (4),

$$\begin{aligned} l_\mathrm{adv}=\frac{1}{N} \times \sum _{i=1}^N S \_ CE(D(G(I_\mathrm{noise},I_\mathrm{c}),I_\mathrm{c}),1) \end{aligned}$$
(4)

The content loss can be expressed as the average difference of gray value between the denoised finger vein image and the corresponding clean image. The specific formula is shown in Eq. (5),

$$\begin{aligned} l_\mathrm{cont}=\frac{1}{N} \times \sum _{i=1}^{N} ||G(I_\mathrm{noise},I_\mathrm{c})-I_\mathrm{clean}|| \end{aligned}$$
(5)

In order to improve the performance of vein texture restoration after de-noising, \(l_\mathrm{lbp}\) uses the classic LBP to represent the texture features of the image. Then, the mean square error loss of the obtained features is calculated, which acts as a texture constraint on G. The equation is explained as follows,

$$\begin{aligned} l_\mathrm{lbp}=\frac{1}{N} \times \sum _{i=1}^{N} ||LBP(I_\mathrm{de\text {-}noise})-LBP(I_\mathrm{clean})|| \end{aligned}$$
(6)

where \(LBP(*)\) is the local binary pattern texture extraction function, and its formula is shown in the following equation,

$$\begin{aligned} \begin{aligned} LBP(I)=\sum _{p=0,(x_i,y_i)\in \varOmega _c}^{7}s(I(x_c,y_c)-I(x_i,y_i)) \cdot 2^p \\ s(x)={\left\{ \begin{array}{ll} 1, \quad x \ge 0 \\ 0, \quad x < 0 \end{array}\right. } \quad \quad \quad \quad \quad \quad \quad \end{aligned} \end{aligned}$$
(7)

where \(\varOmega _c\) is the neighborhood of the central point \((x_c,y_c)\) with the range of 1, and \(s(*)\) is the symbolic function.

3 Training Dataset Build-up Based on Finger Vein Noise Distribution

In the Sect. 2, we described the proposed network model. The input of the model is the noisy image and the corresponding clean image. There must be a one-to-one correspondence between the pixels of the noise image and the corresponding clean image. However, in the practical application of finger vein recognition technology, noise image usually refers to finger vein image with dusty mirror and skin crack. Thus, it is difficult to find a clean finger vein image corresponding to the image with actual noise in order to build-up the training dataset. In this section, a training dataset is generated by using simulated noise on target testing images.

Firstly, the finger vein images with noise in practical application are collected, and the noise density is evaluated through the ratio of noisy points number to the total image points number. The results are shown in Table 1.

Table 1 Noise density distribution of noisy images in practical application

Secondly, the particle size of actual noise and simulated noise are compared, and the comparison results are shown in Fig. 5. Fig. 5a and b are actual noise, Fig. 5c–f are simulated noise.

Fig. 5
figure 5

Noisy vein images with different noise types

According to the results of image comparison, the particle size of salt/pepper noise is the closest to the actual noise, followed by Gaussian noise. From the results in Table 1 and Fig. 5, the most important part of our training dataset is finger vein image with salt/pepper noise, followed by the Gaussian noise, and finally the Poisson noise and the speckle noise. Specifically, the composition of the training dataset in this paper is shown in Table 2. These different types of noise correspond to different fingers and each finger contains 10 finger vein images collected in different periods.

Table 2 The Composition of the Training Dataset

4 Experimental Results

The server used in this experiment has NVIDIA TITAN RTX GPU. The deep learning framework used in this research is PyTorch. The batch size is 32, number of iterations is \(10^5\), and the global learning rate is 0.004. We consider Adam optimizer in our experiment.

In order to verify the performance of the proposed de-noising model after training, the test dataset in this paper consists of two parts. Specifically, the composition of the test dataset in this paper is shown in Table 3. The fingers in the test dataset and the fingers in the training dataset are not repeated, and these different types of noise correspond to different fingers, and each finger contains 10 finger vein images collected in different periods.

Table 3 The composition of the test dataset

In this paper, the visual effects, PSNR, and recognition performance is analyzed in the following experiments: (1) Traditional: the traditional de-noising method based on adaptive switch space domain filter [7]; (2) CGAN: the CGAN de-noising model [13] trained by randomly selected salt/pepper noise; (3) CS-CGAN: the CGAN de-noising model [13] trained by the training dataset designed in this paper; (4) CS-DPCGAN: the CS-CGAN de-noising model with dimension preserving structure; (5) CS-TCGAN: the proposed de-noising model based on CS-TCGAN.

4.1 Evaluate De-noising Performance via Visual

In this section, we evaluate the de-noising performance through visual effect, which reflects the effect of image de-noising.

Figure 6 shows the finger vein images with dusty mirror, considering five different de-noising methods. It can be seen from block 1 in Fig. 6b that by using the traditional filtering de-noising method [7], the noise is removed, but the vein information features are blurred. It can be seen from block 2 of Fig. 6c that although the texture preserved by using the CGAN model [13], which is clearer than that of the traditional filtering de-noising method [7], there is still a problem of vein edge blurring. From block 3 of Fig. 6d that by using the CS-CGAN de-noising model [13], the vein blurred problem has been significantly improved, however, we can see the contrast between the vein and the surrounding background area still needs to be further improved. From block 4 of Fig. 6e that by using the CS-DPCGAN de-noising model, we can see that the contrast between the vein and the surrounding background area has been improved, but the vein details are a little blurred. From block 5 of Fig. 6f, we can see that by using the proposed CS-TCGAN de-noising model, the problem of vein blurring can be solved successfully, and the contrast between vein information and the surrounding background area is improved.

Fig. 6
figure 6

Consider the dust noise, the visual effect of different de-noising methods

Figure 7 shows the finger vein images with skin-crack, considering five different de-noising methods. It can be seen from block 1 of Fig. 7b that the vein damaged by skin crack has poor continuity of vein structure by using the traditional filtering de-noising method [7]. From block 2 of Fig. 7c, we can see that although the CGAN de-noising model [13] improves the continuity of vein structure, the generalization of the model is weak, which leads to the problem of vein broken after de-noising. It can be seen from block 3 of Fig. 7d that the generalization of the CS-CGAN de-noising model [13] has been enhanced, but after de-noising, the restored vein area becomes blurred. From block 4 of Fig. 7e that by using the CS-DPCGAN de-noising model, the blurred problem of vein area has been improved in some certain areas. From block 5 of Fig. 7f, we can see by using the proposed CS-TCGAN de-noising model, the vein is well restored, and the vein edge sharpening degree is higher.

Fig. 7
figure 7

Consider the skin crack noise, the visual effect of different de-noising methods

4.2 Evaluate De-noising Performance via PSNR

PSNR is an index of evaluating image quality, which represents the ratio of the maximum possible power of the signal to the destructive noise power . The higher the PSNR, the higher the image quality [21]. It can be seen from the comparison results in Table 4 that for different noise types, the de-noising performance of the proposed algorithm has been improved significantly compared with the traditional filtering de-noising algorithms, CGAN [13], CS-CGAN [13], and CS-DPCGAN, which shows that the proposed de-noising model based on CS-TCGAN has a stronger robustness for different noise types. The combination of noises is a random combination of salt/pepper noise, Gaussian noise, Poisson noise, and speckle noise.

Table 4 Comparison of PSNR under different de-noising methods

4.3 Recognition Performance

In order to make the comparative experiments more comprehensive, the CNN de-noising model [11] is shown in the recognition performance experiment. The Receiver Operating Characteristic Curve (ROC) curves in Figs. 8, 9 and 10 show that the vein texture of finger vein image is seriously damaged by noise. When False Accept Rate (FAR) is 0, False Reject Rate (FRR) of finger vein images with simulated noise, dust noise, and skin-crack noise are only 92.21\(\%\), 92.17\(\%\), and 59.84\(\%\) without de-noising. By using the traditional filtering de-noising algorithm [7], FRR is decreased to 67.46\(\%\), 62.67\(\%\), and 52\(\%\). By using the CNN de-noising model [11], FRR is decreased to 63.67\(\%\), 60.56\(\%\), and 51.66\(\%\). Although the recognition performance has been improved by these algorithm, there are still some problems such as poor texture recovery which has been discussed in Sect. 3. By using the CGAN de-noising model [13], FRR is decreased to 51.89\(\%\), 58.8\(\%\), and 51.34\(\%\). By using the CS-CGAN de-noising model, the de-noising performance is further improved. FRR is decreased to 42.12\(\%\), 55.5\(\%\), and 49.84\(\%\), but the vein texture recovery is still incomplete. By using the CS-DPCGAN de-noising model, FRR is decreased to 41.35\(\%\), 54.69\(\%\), and 46.69\(\%\). The comparison between CS-CGAN and CS-DPCGAN proves the effectiveness of the dimension preserving structure. Through the proposed CS-TCGAN de-noising model, the problem of incomplete texture repair has been solved, and the restoration ability of de-noising model to vein details was further improved, and FRR was further decreased to 39.17\(\%\), 52.5\(\%\), and 43.35\(\%\), the comparison between CS-DPCGAN and CS-TCGAN proves the effectiveness of the texture loss.

Fig. 8
figure 8

Recognition performance of four kinds of simulated noises under different de-noising models

Fig. 9
figure 9

Recognition performance of dust noise under different de-noising models

Fig. 10
figure 10

Recognition performance of skin crack noise under different de-noising models

In order to further explore whether the proposed CS-TCGAN really indeed preserves the texture features, we select two test datasets with 230 images in each group according to texture richness, namely texture rich test dataset (TR-TDS) and texture sparse test dataset (TS-TDS). The ROC curves in Fig. 11 show that when FAR is 0, FRR of CS-CTCGAN is 5.27\(\%\) lower than that of CS-CGAN on those images with sparse textures. When FAR is 0, FRR of CS-CTCGAN is 7.12\(\%\) lower than that of CS-CGAN on those images with rich textures. It proves that the proposed method is better at keeping the textures.

Fig. 11
figure 11

Recognition performance of images with different texture richness under different de-noising models

5 Conclusions

In order to improve the recognition performance of finger vein technology in noisy environment, a de-noising algorithm of finger vein image based on CS-TCGAN is proposed. Firstly, the optimization process of the texture constrained de-noising model is investigated. Secondly, the dimension preserving structure has been used to reduce the information lost. Finally, the generalization of the de-noising model has been improved by using the training dataset constructed based on the characteristics of finger vein noise. The experimental results show that, compared with the traditional filtering de-noising algorithm and the CGAN de-noising model, the proposed CS-TCGAN approach restores the vein better from the noisy image.