1 Introduction

As a burgeoning technology of identity authentication, finger vein recognition has captured extensive attention from the biometric community. Compared with current popular biometrics identification technologies (e.g. fingerprint recognition, iris recognition, face recognition), finger vein recognition has the following distinct and merits: anti-counterfeit, user friendliness, living body recognition and high security for user information [2, 12, 34]. On the grounds of these merits, this recognition technology is regarded as the solution of individual identification with great promise. In practice, because the haemoglobin in blood vessels absorbs more near-infrared radiation than other substances in skin tissues, finger vein images are usually obtained by transmission of near-infrared (NIR) light (700 nm–900 nm) [15]. Unfortunately, since finger vein is located in the inner part of human skin tissue, it is quite challenging to capture clear edges between venous region and non-venous region. Moreover, given that biological tissue is a heterogeneous medium [23], NIR light scattering in skin tissue reduces the quality of finger veins. Under these limiting and blocking conditions, finger vein recognition performance is not satisfactory. Hence, it is of importance to efficiently improve the visibility of finger vein features via removing scattering. To attack these problems, many efforts have been contributed to improve the image quality, which can be described as two categories, one based on enhancement technology and the other is based on restoration technology.

Enhancement-based methods

To improve the quality of degraded images, numerous methods based on image enhancement algorithms have been conducted. In [24], a histogram template equalization method was designed to improve the contrast of finger vein images to a certain extent. Fu, et al. [5] combined fuzzy theory with Retinex algorithm to enhance near-infrared images. Yang et al. [29,30,31] used different oriented filtering strategies to emphasize the texture of finger veins and yielded positive results. Shin, et al. [18] adopted four-direction Gabor filter and Retinex filter to process two finger vein images respectively before they were fused. Though the above methods could enhance finger vein images in some ways, they did not address the critical issue of light scattering in image degradation problem. This issue results in undesirable performance in terms of finger vein visual improvements.

Restoration-based methods

Owing to scattering in skin tissue during imaging, the collected images are quality deteriorative. In view of light transmission in human skin tissue, Lee et al. [9, 10] adaptively utilized determined depth point spread function (D-PSF) and constrained least square (CLS) filter to restore venous pattern. In [28], taking into account the optical properties of skin layers, a Gaussian-PSF model and two D-PSF models were developed to restore finger vein images step-by-step. However, for D-PSF-based image reduction, proper estimation of biological parameters is an arduous task in practice. To settle this matter, Yang et al. [27, 32] further established Biological Optical Model (BOM) to describe the process of finger vein image degradation, which facilitated the detailed description and analysis of image blurring. Although these methods took the effect of light scattering into consideration and produced all-right restoration results, the estimation of biological optical model parameters was laborious, and the usage was limited in practice.

In view of numerous deficiencies of these methods, we are committed to employ a novel and efficient finger vein restoration method based on CNN, which is capable of restoring images end-to-end. The contributions of this paper are summarized as follows:

  1. 1.

    Instead of respectively estimating non-scattered transmission matrix and the intensity of scattered radiation as most previous methods did, an improved biological optical model is presented to realize the goal of integrating non-scattered transmission matrix and the intensity of scattered radiation in one predictor variable.

  2. 2.

    Furthermore, for the purpose of alleviating scattering removal problem effectively, an end-to-end finger vein image scattering removal model abbreviated as FVSR-Net is designed with the assistance of a light-weight CNN.

  3. 3.

    Compared with other representative methods, our proposed model conquers the light scattering interference and manifests better visual performance and lower equal error rate (EER) in finger vein recognition.

The rest of this paper is organized as follows. Section 3 introduces the process of finger vein degradation and the establishment of biological optical model. In Section 4, the proposed improved biological optical model and FVSR-Net framework are presented in detail. We then report the experimental results in Section 5, and summarize this paper in Section 5.

2 Related work

Human skin is composed of following three layers: epidermis, dermis and subcutaneous, it can be regarded as a collection of absorption and scattering particles. NIR light propagating through human skin tissue is always refracted, absorbed and scattered by these particles. In such a manner, we cannot make the supposition that light propagation without scattering in a straight line. Actually, transmitted light consists of ballistic photons, snake photons and diffuse photons with a view of bio-photonics, as shown in Fig. 1. The propagation ways belong to these three photons are various in skin tissues. Ballistic photons travel in a straight line through a medium. Snake photons are subjected to some slight scattering, but still propagate in a forward or near-forward manner. Diffuse photons undergo multiple scattering and propagate in a random way. Consequently, the NIR light in the skin tissue inevitably suffer a series of multiple scattering, which is the main cause of blurred finger vein images.

Fig. 1
figure 1

The motion of photons when light travels through skin tissue

To depict the effects of direct attenuation and scattering attenuation on finger vein imaging, our previous work [32] capitalized on biological optics and introduced the finger vein image scattering removal model. The biological optical model developed in [32] can be described as

$$ I(x)={I}_0(x)T(x)+\left(1-T(x)\right){I}_r(x) $$
(1)

where the vector x represents the pixel coordinates [x, y], I(x) denotes the captured finger vein images, I0(x) is the scatter-free latent finger vein images to be recovered. Ir(x) represents the local background illumination map, whose value is bound up with the local optical properties of skin issues. T(x) is described as the non-scattered transmission map, which can be further expressed in term as

$$ T(x)={e}^{\mu D(x)} $$
(2)

where μ is called scattering coefficient of skin tissue, D(x) is the depth of object in the skin layer. The first term of (1),I0(x)T(x), represents the direct attenuation component of the incident light into skin tissues. The last term of (1),(1 − T(x))Ir(x), means an approximately scattering component that appears from skin tissues. As shown in Fig. 2, under the effect of direct attenuation and scattering attenuation, the collected finger vein image appears degraded blur. Given Ir(x) and T(x), I0(x) can be easily reconstruct from (1).

Fig. 2
figure 2

Schematic diagram of light substance interaction in skin tissue

Conventional finger vein image restoration methods put the emphasis on estimating T(x) and Ir(x), respectively. For instance, [32] coped with scattering removal by proposing biological optical model (BOM). Ir(x) was computed via considering the interaction of particles around the object, and the last term of (1) was estimated by the method of [20] to acquire T(x). After estimating Ir(x) and T(x),I0(x) can be calculated with little hindrance. To describe the forward probability of the scatter energies representatively, [27] further improved BOM via multiplying the last term of (1) by a weighted factor α1 and came up with Weighted Biological Optical Model (WBOM). The estimation methods of Ir(x)and T(x) were based on anisotropic diffusion and Gamma correction. Albeit the approaches mentioned above achieved these ends of improving visual and recognition performance to a certain extent, they did not pay more attention to explore a one-step model, that is involving T(x) and Ir(x) integrated into one variable. In order to improve upon and augment the previous work, we take advantage of multi-scale CNN in an attempt to restore finger vein images.

3 The proposed restoration method

In this section, the proposed FVSR-Net is expounded. We introduce the improved biological optical model in the first subsection, which facilitates the achievement of end-to-end output images. In the second subsection, both architecture and parameter setting of FVSR-Net are recommended in detail.

3.1 Improved biological optical model

As explained in related work, it is significant to felicitously choose parameter estimated methods according to the characteristics of different datasets. Based on BOM in (1), the restored finger vein images can be generated by

$$ {I}_0(x)=\frac{1}{T(x)}I(x)-\frac{\left(1-T(x)\right){I}_r(x)}{T(x)} $$
(3)

Traditional methods put forward in [27, 32] are incapable of directly minimizing the restoration errors on I0(x). Actually, such an indirect optimization will lead errors to be accumulated or even amplified, when combining both T(x) and Ir(x) to calculate I0(x) in (3). Under the given circumstance, a new variable E(x) in (4) is adopted, serving as a bridge for connecting T(x) and Ir(x). In addition, the existence of E(x) makes it possible to directly minimize restoration errors in the pixel domain. Based on the description above, the BOM in (3) can be re-expressed as below:

$$ {I}_0(x)=E(x)I(x)-E(x)+a $$
(4)

Where,

$$ {\displaystyle \begin{array}{l}E(x)=\frac{\frac{1}{T(x)}\left(I(x)-{I}_r(x)\right)+{I}_r(x)-a}{I(x)-1}\\ {}\end{array}} $$
(5)

In this re-expressed formula, E(x) is regarded as the estimation map of finger vein image and it can roughly outline the veins. a is set to the default of 1 as constant bias. Moreover, the joint estimation of T(x) and Ir(x) makes it feasible to confine each other mutually and gives rise to more credible estimation map. We then concentrate on constructing an input-adaptive CNN model, whose weights will change with input original finger vein images. Thereby, this CNN model can minimize the restoration error between the output I0(x) and the ground truth. To justify the importance and effectiveness of jointing T(x) and Ir(x), we conduct a comparative experiment that compares our proposed method with the baseline [32]. The baseline estimates T(x) and Ir(x) through separate steps, while the proposed method learns T(x) and Ir(x) together inE(x). As observed in Fig. 3, the restoration performance of baseline is overshadowed by FVSR-Net. From the marked red boxes in Fig. 3 (b), it can be seen that parts of vein are even broken owing to the error accumulation. That precisely emphasizes the importance of joint estimation.

Fig. 3
figure 3

Visual comparison between FVSR-Net using (4) and baseline using (3). (a) Input images from Dataset A (to be introduced in Section 3.1); (b) Baseline using (3); (c) FVSR-Net using (4)

3.2 End-to-end scattering removal network

In Fig. 4 (a), our FVSR-Net is comprised of two parts: an E-Net to estimate E(x)from the input imageI(x); a degraded image restoration module that enables image to be output end-to-end. The architecture of E-Net is depicted in Fig. 4 (b). It is the critical module of FVSR-Net, being responsible for estimating the relative scattering level and generating the estimation map. Our network design is based on the following considerations:

Fig. 4
figure 4

The framework and configuration of FVSR-Net. (a) The framework of FVSR-Net; (b) E-net architecture used in FVSR-Net

Early fusion of different feature maps

Through many classic works (such as DenseNet [7], U-Net [17]), it can be seen that fusing features of different scales is a vital means to improve image processing performance. To enable the E-Net to take into account the semantics of the whole input images and impose coherence among local structures, we are committed to merging the two kinds of features extracted from images into more discriminative fusion features. Benefited from the DenseNet [7], E-Net includes 3 dense blocks; each makes up of two convolutions, followed by a concatenation layer. Not only that, concatenation layers are then convoluted by larger convolutional kernels to gain richer semantic information.

Various filter sizes design

In addition, more recent attention has been put on the use of various filter sizes in one CNN. In [16], the coarse-scale feature maps are fed to the fine-scale network to generate a refined transmission map. The inception architecture in GoogLeNet [19] adopts parallel convolutions with varieties of filter sizes. Since finger vein feature is a kind of complex texture feature, which contains many detailed vein branches. Employing various filter sizes is conducive to enrich receptive fields when features of different layers are extracted. Inspired by these work, E-Net is formed with multi-scale features by applying different sizes of convolutional filters among. Each convolutional filter is followed by a Rectified Linear Unit (ReLU).

Light-weight structure

Moreover, for improving training speed and speeding up the convergence of the network in practice, a light-weight structure is taken into consideration. We introduce 1 × 1 convolution kernel to decrease the parameter sizes without losing network performance, and employ only ten convolutional layers in E-Net (the specific parameter setting of 10 convolutional layers are listed in Table 1). The later experiments show that it takes less epochs and less time to achieve quite good results.

Table 1 Network parameter setting details

Overall, such a design brings good practical value. It can not only compensate the missing spatial information during convolutional process, but also entirely mine the features from non-adjacent layers. The whole image information is also fully utilized in the transmission process, which guarantees a well restoration effect.

Besides, to further reveal the merits of multi-scale design, we construct a simple CNN model without any concatenation, that is “CONV1 → CONV2 → CONV3 → ⋯ → CONV10”(the parameter setting is same as Table 1). According to the description of Fig. 5, we discover that it is formidable to estimate E(x) via employing the CNN structure with no concatenation. Especially in Fig. 5 (b), the boundaries between veins and background are even more blurred than in the original images.

Fig. 5
figure 5

Visual comparison of CNN without concatenation and FVSR-Net. (a) Original images; (b) Restored images using CNN without concatenation; (c) Restored images using FVSR-Net

In addition, CNN is expert in extracting image features, but in the case of our task, the assist of improved BOM is also indispensable. Therefore, we compare two approaches: CNN model without improved BOM and FVSR-Net based on improved BOM. As shown in Fig. 6, FVSR-Net allows a more complete restoration of the original image. Beyond that, more pure background is acquired through our method. In this way, it is plausible that the assistant of improved BOM is effective and necessary.

Fig. 6
figure 6

Visual comparison of CNN without improved BOM and FVSR-Net. (a) Original images; (b) Restored images using CNN without improved BOM; (c) Restored images using FVSR-Net

4 Experiments and results

In this section, we manifest the effectiveness of this method through several experiments and compare it with other current approaches. The experimental results indicate that FVSR-Net has satisfactory performance in scattering removal and finger vein recognition.

4.1 Datasets

  1. (1)

    Dataset A utilized in our experiments is collected from a lab-made finger vein image acquisition system with a 760 nm NIR LED array source as shown in Fig. 7 (a), and then acquired from the original images by Region of Interest (ROI) localization and segmentation method proposed in [25]. Figure 7 (b) indicates some preprocessed samples in Dataset A. The homemade dataset includes a total of 5850 finger vein images from 585 individuals, and 10 images per individual. We randomly divide Dataset A into two parts, 5270 finger vein images for training and 580 finger vein images for validating.

Fig. 7
figure 7

Finger vein image acquisition. (a) Our lab-made finger vein image acquisition system; (b) Samples of lab-made dataset

  1. (2)

    Dataset B is from the Shandong University finger vein dataset [33] (abbreviated as SDU dataset). Here, 106 volunteers contribute their finger vein imageries, each of whom supplies index, middle and ring fingers from two hands. In one session, each finger provides 6 images, thus 3816 images are obtained. To test our proposed FVSR-Net, we randomly select 100 fingers for a total of 600 finger vein images. The ROI images with size of 200×100 are then received via the same method as for Dataset A.

4.2 The acquirement of approximate ground truth

As a matter of fact, it is out of the question to obtain high visibility venous images in practice according to the description in Section 1. Consequently, unlike other methods of restoring degraded images [1], the troublesome problem of recovering finger vein images is to acquire reliable ground truth corresponding to collected finger vein images. Considering that vein backbone and branches are affected by scattering comparably when incident light propagates through skin tissues, we can reasonably employ a method of regarding vein backbone as the approximate ground truth. As depicted in Fig. 8, there are several steps of extracting finger vein backbone.

Fig. 8
figure 8

The process of acquiring finger vein backbone

First, original finger vein images are preprocessed before extracting vein backbone for determining stable vein backbone regions. As shown in Fig. 9 (b), the vein backbone regions are faintly visible after preprocessing, but still surrounded by a large amount of noise. The band-pass property of Gabor wavelet and its capability of selecting directions make Gabor transformation very suitable for noise reduction and local feature analysis in image processing. Hence, finger vein image enhancement is then performed by even-symmetric Gabor filters [26], which can be expressed as:

$$ {G}_{mk}^e\left(x,y\right)=\frac{\gamma }{2{{\pi \sigma}_m}^2}\exp \left\{-\frac{1}{2}\left(\frac{x_{\theta_k}^2+{\gamma}^2{y}_{\theta_k}^2}{{\sigma_m}^2}\right)\right\}\times \left(\cos \left(2\pi {f}_m{x}_{\theta_k}\right)-\exp \left(-\frac{v^2}{2}\right)\right) $$
(6)

where m is the scale index, k is the orientation index. m and k are set to 3 and 8, respectively. Finally, more stable vein backbone regions with less noise are obtained in Fig. 9 (c).

Fig. 9
figure 9

Ground truth acquisition. (a) Original images; (b) The results after preprocessing; (c) Finger vein backbone

What calls for special attention is that the ground truth obtained by the above method is approximate, rather than completely accurate. In point of fact, finger vein images suffer many disturbances during imaging, such as uneven brightness, position variations, and the interference of other substances in skin tissue. These interferences are random, unavoidable and unstable, and are finally represented as texture branches after image processing, as observed in Fig. 9 (c).

4.3 Implement details

For training FVSR-Net, we adopt Adam [8] as the optimization method with a weight decay of 0.0001 and a momentum of 0.9. The learning rate and batch size are set to 0.001 and 4, respectively. The weights of network are initialized by Gaussian random variables. We also clip the gradient to restrict the norm within [−0.1, 0.1]. The FVSR-Net model is trained on a NVidia GTX 1080Ti GPU, Intel Xeon(R) sliver 4110 CPU@2.10GHz and 32GB RAM using PyTorch framework for 10 training epochs.

Mean Square Error (MSE) [37] is often applied as loss function, but the traditional MSE-based loss cannot meet the rising demand of expressing images via imitating the human visual system. Considering that finger vein image restoration is a real-world application, a perceptually motivated metric should be employed instead of only MSE. Structural Similarity Index (SSIM) [21] is a perception-based model, and it is beneficial to evaluate the perceptual phenomena of images. To produce visually pleasing finger vein images, we combine SSIM and MSE as loss function. In our experiments, given the training sample patch P, the MSE loss function can be written as:

$$ {\mathrm{\ell}}^{MSE}(P)=\frac{1}{N}{\sum}_{p\in P}{\left[I(p)-K(p)\right]}^2 $$
(7)

Where N is the number of pixels in patch P, p represents the index of the pixel [x, y], I(p) and K(p)are the pixel values of the generated finger vein image and the ground truth image respectively. The SSIM formula and SSIM loss function can be defined as:

$$ SSIM(P)=\frac{2{\mu}_x{\mu}_y+{C}_1}{\mu_x^2+{\mu}_y^2+{C}_1}\cdot \frac{2{\sigma}_{xy}+{C}_2}{\sigma_x^2+{\sigma}_y^2+{C}_2} $$
(8)
$$ {\mathrm{\ell}}^{SSIM}(P)=\frac{1}{N}{\sum}_{p\in P}1- SSIM(p) $$
(9)

Where, \( {\mu}_x=\frac{1}{N}\sum \limits_{p=1}^N{x}_p \) is the mean luminance value of x, \( {\sigma}_x^2=\frac{1}{N}\sum \limits_{p=1}^N\left({x}_p-{\mu}_p\right) \) is the variance of x, \( {\sigma}_{xy}=\frac{1}{N-1}\sum \limits_{p=1}^N\left({x}_p-{\mu}_x\right)\left({y}_p-{\mu}_y\right) \) is the covariance of x and y, the function of C1 and C2 is to prevent denominator from being 0. By combining MSE loss function and SSIM loss function, the final loss function utilized in this task can be written as:

$$ {L}_{loss}={\mathrm{\ell}}^{MSE}(P)+{\mathrm{\ell}}^{SSIM}(P) $$
(10)

4.4 Finger vein images restoration

After training FVSR-Net, the restoration results of degraded finger vein images are shown in Fig. 10. The estimation map in Fig. 10 (b) outline the vein backbone roughly, and the corresponding extracted vein backbone in Fig. 10 (d) is interconnected and abundant. We also observe that scattering effect is suppressed effectively, and the contrast between venous region and background is increased notably.

Fig. 10
figure 10

Finger vein image restoration. (a) Original images I(x); (b) Estimation mapE(x); (c)E(x)I(x) . (d) The restored images I0(x); (e) Approximate ground truth

Qualitative visual comparison results

As illustrated in Fig. 11, we evaluate FVSR-Net against several common approaches for restoring finger vein images: biological optical model (BOM) [32], and weighted biological optical model (WBOM) [27]. Additionally, a dehaze-based restoration method (AOD-Net) [11] is directly adopted to deblur finger vein images in case of simply regarding degraded finger vein images as haze images. As displayed in Fig. 11 (b), we note that restoration results based on BOM are somewhat sensitive to noise, even though BOM has good effect on recovery task. Figure 11 (c) shows better restoration performance than Fig. 11 (b), but compared to Fig. 11 (d), scattering residue in Fig. 11 (c) is still evident. In addition, the dehaze-based method depicted in Fig. 11 (d) yields satisfying removal of scattering, but the contrast between finger vein region and background can still be further improved. Overall, the experimental results in Fig. 11 (e) imply that our proposed FVSR-Net is superior to these current methods on the part of finger vein information preservation and scattering removal, and is more visually faithful to the ground truth.

Fig. 11
figure 11

Image restoration comparison. (a) Original images; (b) BOM [32]; (c) WBOM [27]; (d) AOD-Net [11]; (e) The proposed method

In practical application, we find that WBOM algorithm does not perform very well in restoring strongly scattered images and low-quality images, even though it manifests pleasurable results in restoring weak scattering images. To verify the preponderance of FVSR-Net, five low-quality finger vein images in Fig. 12 (a) are selected, which have low contrast and serious distortion. The restoration results employing BOM and WBOM methods are illustrated through Fig. 12 (b) and Fig. 12 (c). It is not hard to find that the images output by BOM and WBOM methods still bring obstacles to distinguish venous region and background, which undoubtedly create inferior recognition performance. Particularly worth mentioning is that our proposed FVSR-Net not only obtains better restoration results for weakly scattered images, but also makes up for the inadequacy of traditional algorithms in restoring strong scattering finger vein images, as shown in Fig. 12 (e).

Fig. 12
figure 12

Low quality image restoration comparison. (a) Original images; (b) BOM [32]; (c) WBOM [27]; (d) AOD-Net [11]; (e) The proposed method

Quantitative comparison results

Except for the previous visual comparisons, PSNR [13], SSIM and Scoot results about the restoration image quality are reported in Tables 2 and 3. Since FVSR-Net is trained under the MSE loss and SSIM loss, higher PSNR and SSIM are acquired than others. As far as we know, Scoot [4] is a state-of-the-art perceptual metric that simultaneously considers the block level spatial structure and co-occurrence texture statistics. So, we adopt Scoot as an approach to evaluate the restoration performance of FVSR-Net. And more attractively, FVSR-Net still has a greater Scoot preponderance than other competitors, even if Scoot is not directly referred to as an optimization criterion.

Table 2 Average PSNR, SSIM and Scoot results on Test Set A
Table 3 Average PSNR, SSIM and Scoot results on Test Set B

4.5 Running time comparison

Moreover, running time comparison experiments between our proposed method and other typical restoration methods are presented. The first image of each 100 individuals in Dataset B is selected for time cost comparison, and the average time of restoring 100 images in various approaches is given in Table 4. The time cost experiments of BOM and WBOM are performed by MATLAB R2014a on a PC with i5–4590 CPU 3.30GHz and 4GB RAM. And the time cost experiments of AOD-Net and FVSR-Net are performed on Intel Xeon(R) sliver 4110 CPU@2.10GHz and 32GB RAM using PyTorch framework, without GPU acceleration. As displayed in Table 4, it takes much less time to restore one image by using FVSR-Net than traditional methods, owing to the light-weight design of E-Net. At the same time, the time consuming of our restoration approach is slightly larger than AOD-Net. A plausible reason is that more convolutional layers added in FVSR-Net than AOD-Net. On the whole, it still meets real-time requirement and is acceptable in the practice application.

Table 4 The average restoration time of 100 images in Dataset B (image size: 200 × 100)

4.6 Matching test

In order to further demonstrate our proposed method is also advantageous to increase recognition accuracy, a simple but effective matching method (named matrix matching method) is employed in our experiments. Matrix matching method calculates the correlation coefficient between two matrices, wherein the similarity of two matrices is expressed as

$$ {M}_s=\frac{\sum \limits_{j=1}^m\sum \limits_{k=1}^n\left(A\left(j,k\right)-\overline{A}\right)\left(B\left(j,k\right)-\overline{B}\right)}{\sqrt{\left(\sum \limits_{j=1}^m\sum \limits_{k=1}^n{\left(A\left(j,k\right)-\overline{A}\right)}^2\right)\left(\sum \limits_{j=1}^m\sum \limits_{k=1}^n{\left(B\left(j,k\right)-\overline{B}\right)}^2\right)}} $$
(11)

where \( \overline{A} \) and \( \overline{B} \) represent the averages of the matrices A and B, respectively. To assess finger vein recognition performance, 1000 finger vein images associated with 100 fingers from Dataset A are regarded as test set A, and 600 finger vein images associated with 100 fingers from Dataset B are employed as test set B. The ROC (Receiver Operating Characteristic) curves of different restoration methods are plotted in Fig. 13 (a) and Fig. 13 (b). In addition, x-coordinate stands for FAR (False Acceptance Rate), y-coordinate represents FRR (False Rejection Rate), and EER (Equal Error Rate) is the error rate when FAR and FRR are equal. Table 5 shows the EERs corresponding to Fig. 13 (a) and Fig. 13 (b). We can clearly observe from Fig. 13 and Table 5 that our proposed FVSR-Net achieves the lowest EER and obtains the best recognition performance. This indicates that FVSR-Net is capable of representing finger vein features reliably and effectively.

Fig. 13
figure 13

ROC curves of different finger-vein restoration results. ROC curve (a) belongs to the result of test set A; ROC curve (b) belongs to the result of test set B

Table 5 Equal Error Rates (%) of different restoration methods

In Section 4.2, we mentioned that the ground truth employed in this paper is approximate, rather than completely accurate. This is due to the fact that ground truth contains some unstable, random interference branches. In the training process, it is hard for convolution filters in CNN to find a uniform response and output those unstable branches. On the contrary, vein backbone has more reliability and robustness, so that can be output stably and distinctly. More importantly, the finger vein recognition performance is determined by the robust vein backbone to a large extent. Randomly distributed, unstable branches do not contribute to finger vein recognition, but rather disrupt the structure of vein backbone, thereby hindering recognition performance. To further illustrate, we randomly select 2000 finger vein images (200 fingers×10 images in Dataset A) and 1000 finger vein images (100 fingers×10 images in Dataset A) for the recognition experiment. From the results shown in Fig. 14, finger vein images restored by FVSR-Net achieves the lower EER values and better recognition performance.

Fig. 14
figure 14

ROC curves of proposed method and ground truth. ROC curve (a) belongs to 1000 finger vein images (100 fingers×10 images per finger in Dataset A); ROC curve (b) belongs to 2000 finger vein images (200 fingers×10 images per finger in Dataset A)

5 Conclusion

In this study, we propose an end-to-end convolutional neural network named FVSR-Net to address important problems in finger vein image restoration. First, based on biological optical model, an improved biological optical model is put forward to estimate all parameters without any intermediate step and output the restored image end-to-end. Then, we apply a multi-scale CNN to the finger vein scattering removal task and extract vein backbone features clearly. Experimental results indicate that our proposed method obtains better visual performance and recognition performance. Last, FVSR-Net not only works well in restoring weakly scattering images, but also earns favorable restoration results on strong scattering and low-quality images.

Moreover, there is room for improvement in future works: (1) Our proposed method is based on the Biological Optical Model, which is a simplified and ideal physical model. However, there may be highly nonlinear transformation between the blurred image and ground truth. Whether the design of the image restoration algorithm itself have to rely on this physical model? It deserves further exploration. (2) Introducing attention-based models [3, 14, 22, 35, 38] also assists to acquire more reliable results. For example, it is known that the estimation accuracy of one scale in a multi-scale design affects the next scale. Inspired by [39], introducing a channel-wise attention mode is conducive to alleviate the multi-scale bottleneck issue. (3) How ground truth is acquired affects restoration results, more or less. As such, training raw images without ground truth via unsupervised [6] or weakly-supervised attention models [36] is worth investigating further.