Keywords

1 Introduction

For many years, the commercial industry had not faced as big a challenge as SARS-COV 2 [20]. With social distancing as a basic rule and the need to ensure business continuity, companies of all sizes and sectors have migrated their products and 100 % digital ecosystem services [21]. Common and highly standardized methods such as bank and public institutions onboarding onboarding, which implies that the client physically moves to the institution and performs long face-to-face procedures, were replaced by digital onboarding systems [1]. One step in digital onboarding consists of authenticating the identity of the client from the capture of identification photographs, face and, in more strict cases, fingerprints [2]. Fingerphoto technology has been appearing in the academic area for years, but in recent months it has become very popular in the biometric and digital identity area. In this situation, it becomes highly necessary to have remote fingerprint extraction systems and at low cost (compared to the traditional method that requires the purchase of specialized sensors), these systems must interact in uncontrolled environments, facing lighting, color, background texture as well as perspective and rotation changes in the fingerphoto capture [3, 7, 10]. In this paper, we propose a preview of fingerphoto extraction system with mobile cameras for capture. For now, the system is made up of background removal fingerprint segmentation stages, which are the most affected stages by the method of capture. According to what is reported in the state of the art, this is the first approach to fingerphoto background removal using deep learning, with U-Net as a proposal for finger-skin semantic segmentation. Final images are compatible and interoperable with such touch-based fingerprint systems. The results reported can provide a preview of a potential solution to biometric digital onboarding.

2 Related Work

Most of the current fingerprint recognition systems perform biometric acquisitions using touch-based devices, such as optical and solid-state sensors [9]. In Fig. 1 is shown the main components of these traditional systems, which include image acquisition, enhancement, feature extraction, and matching [15], sometimes enhancement filters are not necessary because the acquisitions images result so clearly.

Fig. 1.
figure 1

Traditional fingerprint recognition system

Touchless recognition systems are generally based on images captured by CCD cameras. These images are vastly different from those obtained using touch-based acquisition sensors [4, 9]. Most fingerprint recognition systems in the literature can be characterized by three primary steps: acquisition, computation of a touch-equivalent fingerprint image, and feature extraction and matching [4]. This section presents reported techniques for acquiring, segment and enhance touchless fingerprint examples based on one single capture (two-dimensional samples).

2.1 Image Acquisition

Fingerprint sensors are designed for biometric use. On the other hand, the main challenge of any fingerphoto system is that smartphone cameras are not designed for fingerprint collection [3, 4, 7]. Derawi et al. [11] shows the acquisition of fingerprints using two different smartphones, Nokia and HTC. They established a series of parameters for the acquisition protocol like the led flash on, ISO speed, and auto-focus tools provided by the smartphone in use, to ensure better performance of the proposal. However, they show the obstacles presented in any touchless fingerprint recognition system, such as image blurring, illuminance saturation, and perspective distortions. The recognition of fingerphoto technology with a low-resolution camera, in a fixed position and under laboratory conditions, was tested in [13], they proposed a continuous shooting mode that was used for the camera to capture multiple photos of test subjects at once in one session and select the best images. Mueller et al. [14] names his work Biometric Identity with low-cost equipment , where they carry out a series of experiments, with attachments and observations, to a web camera that must be a configuration to the sharpness and focus of the web camera with which to want to work to avoid reflections produced by the flash. In [8, 12], the challenges to be solved in touchless fingerprint detection systems are described, such as segmentation in uncontrolled environments and the extraction of robust characteristics to the distortions associated with the absence of a support surface (the feature extraction is not part of this investigation). However, distortion corrections will possibly be a technique to employ for future work.

2.2 Background Removal

For background subtraction, available state of the art schemes rely on capturing a single finger with a controlled background [10, 11, 13, 14]. Stein et al. [10] shows a traditional process based on image processing algorithms using the red channel of the input image and set a fixed value for thresholding to find the skin zone. In [5, 10, 13] present an adaptive skin thresholding algorithm to find the skin finger regions; alongside that connected component technique is used to isolate the suggested skin zone. All these proposes need a controlled background and illumination conditions. Raghavendra et al. [7] performs the background segmentation using color information by applying the Mean Shift Segmentation Algorithm in uncontrolled background, achieving a 98% of correct segmentation. Finally, [3] presents a multi-finger identification system based on HSV color space and analyzes the color probabilities frequency, helped by otsu segmentation and grab cut filter, to find the fingerprint object. In comparison [3, 7] design a system without background and illumination conditions controlled, they develop a mobile app to decrease these extrinsic variations.

2.3 Fingerprint Enhancement

Another potential step in the fingerprint recognition process is image enhancement [4, 9]. In traditional systems, global approaches are usually adopted during the enhancement of latent fingerprints or for the computation of synthetic fingerprint images [4]. A series of algorithms have been applied in diferent fingerphoto proposes like, geometric normalization, passband filters [10], histogram equalization [6], PCA [5], and adaptative histogram equalization [3], these algorithms has not been evaluated isolated, they have been evaluated by the FAR and ERR results by identity probes. This step is used just in case the system will be compatible with traditional fingerprint systems.

3 Methods and Materials

In this section, it is explained the foundations that constitute this research. The proposal is our company’s property technology. However, we provide an overview of the main components of the system.

3.1 Fingerphoto Acquisition

It has been shown that fingerphoto acquisition is the first and most critical component of a system [4, 13]. The null access to a validated and standardized database for the capture of touchless fingerprints (for commercial purposes) in uncontrolled environments, requires the generation of its database that is adapted to the specific purposes of this research. It was established a series of rules to capture the fingerphotos guided by state of the art recommendations [3, 4, 7, 10], as illustrated in Table 1.

Table 1. Protocol’s steps recommended to fingerphoto acquisition

For each individual several 10 images were required, one for each finger, and it was needed to use their smartphones. By the pandemic situation, the individuals were not in the same location, and each set of images had uncontrolled capture conditions, finally, the images were sent via email.

3.2 Background Removal

Background removal is a necessary step for fingerphoto naturally. There are documented lots of algorithms based on image processing and machine learning that could be used for this process. We can assume that just the finger is the skin zone presented in the image, the rest is the background. This research proposes to use of deep learning for the challenge involved in capturing fingerphotos for uncontrolled backgrounds.

Deep Learning. The algorithms that operate under the deep learning paradigm, provide the possibility of extracting characteristics directly from the raw input data, without the need to apply preprocessing techniques, in such a way that, through the approach of learning representations and through architectures composed of multiple layers that determine the depth of the model. Deep learning methods automatically learn successive layers of increasingly sophisticated and meaningful representations of the raw input data [8].

U-Net for Semantic Segmentation. U-Net is a CNN architecture that was designed by Ronneberger et al. [17] to be applied in related biomedical problems with the semantic segmentation of cells and neuronal structures. The goal of semantic image segmentation is to label each pixel of an image with a corresponding class of what is being represented. For fingerphotos segmentation case, the pixels that represent the finger skin and contains the fingerprint can be assigned to the finger class, any other pixel contained in the image will be assigned to the background class.

Dice Coefficient Metric. Intuitively, a successful prediction is one which maximizes the overlap between the predicted and true objects. There are different metrics for this goal, in this proposal Dice coefficient was used:

$$\begin{aligned} Dice \; coefficient = \frac{\Vert A \cap B\Vert }{\Vert A\Vert +\Vert B\Vert } \end{aligned}$$
(1)

Here, A and B are two segmentation masks for a given class (but the formulas are general, that is, you could calculate this for anything, e.g. a circle and a square),\(\Vert A\Vert \) is the norm of A (for images, the area in pixels), and \(\cap \) is the intersection.

3.3 Fingerphoto Segmentation and Enhancement

The first step for fingerprint segmentation and enhancement established in state of the art is to convert the RGB input images in grayscale fingerphoto. A bandpass filter is applied to denoise the original image and a thresholding algorithm to highlight ridges and valleys of the fingerphoto. To read more about the Wahab filter, we can consult [18].

3.4 Equivalent Touch-Based Image

Samples captured by Smartphone cameras (touchless sensors) cannot be directly used by recognition methods designed for touch-based fingerprint images [4]. Touchless images must also be normalized to a fixed resolution. This process helps to obtain a touch-equivalent fingerprint. Touchless fingerprint now can be used by matching techniques based on minutiae features. The standard fingerprint scale is 500 dpi resolution (calibration output images captured by contact sensors). The final normalization task is registration of the minor axis of the image to the standard measure of 9/10 of the height of the final touch-equivalent image. Fingerprint images are stores in grayscale and PNG format.

3.5 NIST Fingerprint Image Quality

The performance of biometric systems is dependent on the quality of the acquired input samples [19]. The NIST Fingerprint Image Quality (NFIQ) algorithm is a standard method to assess fingerprint image quality. NFIQ algorithm was designed to predict the performance of minutiae matches. NFIQ is an expression of quality based on utility that reflects the predicted positive or negative contribution of an individual sample to the overall performance of a biometric system. NFIQ has an interval of \([1,5] \epsilon Z\), being 1 the best value and 5 the worst, NFIQ = 1,2,3 are good quality acquisition and NFIQ = 4,5 bad objects. In addition, NIST is active in the ISO/IEC JTC1 SC37 standardization activities on biometric quality and sample conformance (ISO/IEC 29794).

4 Experiments and Results

To evaluate the proposed method for fingerphoto extraction, we perform two experiments. The first experiment consisted of background removal by the U-net skin model training and evaluation. The second experiment shows a test of the quality of the data set using the NFIQ metric.

Fig. 2.
figure 2

Touchless fingerprint extraction method

Figure 2 shows the general process done by the method proposed, with two main phases to deployment. The input image is cropped by hand to have the initial ROI (finger, no finger regions). Then U-net CNN architecture is used to perform the semantic segmentation task. U-net takes the cropped image on a binary format like target and processes the background removal. U-net generates an output hoped in binary format. Then the binary mask and the input image are multiplied pixel by pixel, so the result images represent a reduced area that contains the fingerprint. The enhancement process has a minimal quantity of information to take care of, so this process details are described in Sect. 3.3.

4.1 Fingerphoto Dataset

By the pandemic situation, the fingerphoto capture was done by remote acquisition. There are fingerphotos from Mexican population located in different states. There was recollected an amount of 478 fingerphotos with a confidentiality consent established. Each participant make use of their smartphone camera and follow the recommendations mentioned in Sect. 3.1.

4.2 U-Net for Background Removal

Once the data was collected, the next step was to manually select the bounding box of the area of interest (the fingerprint), generating the expected binary segmentation map for each image. After constructing the ground truth, was the data augmentation technique performed using the Python ImageAug library, generating synthetic images with random variations of rotation and brightness. Images were re-scaled to a resolution of \(512\,\times \,512\) pixels. Subsequently, three datasets were generated as follows: 70% train, 10% validation, and 20% test. Adam optimizer was established, with a learning rate of \( 1e^{-4} \) and as loss function \( Binary \; cross-entropy \). To generate the performance history of the model, the metrics accuracy and \( Dice \; coefficient \) were used to provide accuracy in the categorization of each pixel and information about the overlap of the expected and predicted zones, respectively. The was established 500 epoch for training and a patience = 100. Google Colaboratory was used to train the convolutional network and take advantage of free Google GPU per session. The best performance of the segmentation model, concerning the validation set, was obtained in epoch 321 with an validation accuracy of 98.76% with a dice coefficient of 0.9799.

Fig. 3.
figure 3

Training metrics for U-net semantic segmentation

A comparison was done for evaluating the viability of the proposed method versus other proposals reported in state of the art [5, 6, 10]. Figure 4 shows the results of two different input images with uncontrolled backgrounds. Figure 4 corresponding to the input image in RGB space, next d) Ground truth for finger segmentation is presented.

Fig. 4.
figure 4

Comparison between proposed method versus state of the art proposals

In terms of visualization c) Stein et al. [10], Terawi et al. [5], Sankaran et al. [6], methods were replicated. All those approaches are based on traditional image processing methods and are focused on color attributes that present inconsistency in the uncontrolled acquisition process. In comparison, the CNN method proposed Fig. 4d) is capable of distinguishing shapes and textures that improve finger segmentation for background removal tasks.

Then, an evaluation for each method was done by the Dice coefficient metric in the test set. Table 2 reports the evaluation results.

The table is divided into four columns: research replicated, the segmentation method, average accuracy calculated, and the dice coefficient average observed in the test set proposed. In comparison, the method proposed achieves an accuracy equals to \(94.40\%\) a \(dice \; coefficient = 0.9592\) over other methods reviewed. These results represent more stable process on uncontrollable image acquisition and a much-improved reconstruction of the finger.

4.3 Fingerphoto Extraction

Neuro Verifinger and NFIQ were used to evaluate the quality and compatibility of touchless images. The short-term objective is to have a system compatible and interoperable with commercial technology, and the long-term goal is to be compatible with legacy databases. In traditional systems, NFIQ is a primary step used to check the quality of fingerprints. Despite the NFIQ metric is not compatible with touchless finger images, we decide to evaluate the dataset to know how the data is distributed. The first part was dedicated to evaluating the segmented images, just applying adaptive skin thresholding, an example is shown in Fig. 5d).

Table 2. Comparative table for finger semantic segmentation

Figure 5 shows the different processing steps that the input image suffers in the system. a) is the input RGB image and b) is the binary mask predicted by U-Net skin model, c) is the multiplication of a) \(\times \) b) to do the background removal, then d) is the resulting image to the first valleys binarization, Neuro Verifinger is used to preview the d) minutiae.

Fig. 5.
figure 5

Fingerphoto processing method and evaluation

Fig. 6.
figure 6

NFIQ evaluation of dataset

The dataset evaluation obtained an NFIQ 1–3 score of 47% and an NFIQ score of 4–5 of 53% (Being NFIQ = 1 the highest quality score and NFIQ = 5 the worst). The next step was to apply the image enhancement process. This step is an important module established for touchless fingerprints. The graphic bar shows the NFIQ predominance in Fig. 6. NFIQ = 1–3 score of 46% and an NFIQ = 4–5 score of 54% were obtained. The results were very similar for both probes. Nevertheless, images are different if we evaluate by a visual inspection and the enhancement images have a less false minutiae presence. Images extracted are compatible with commercial biometric software, reading it and finding minutaes.

5 Conclusions and Feature Work

The fingerphoto semantic segmentation model proposed has a better performance in comparison with other methods reported in state o the art [7, 10], with an accuracy of 94.49% and a dice coefficient metric equal to 0.9592. By the test evaluation, the proposed model achieves a better performance in comparison with other approaches reported in the state of the art with a test accuracy lower than 90%. This model can improve the accuracy with more images and different types of data augmentation. Background removal is a critical step in fingerphoto extraction due to lots of background colors, shapes and brightness can be featured. U-net models can learn from the attributes, shapes, and texture of the skin color, unlike traditional methods that usually take information from a single characteristic, which are generally the skin color attributes.

The dataset evaluation shows that almost half of individuals have bad quality for enrollment purposes, next to that, trying to increase the quality a fingerprint enhancement was applied, and NFIQ quality does not improve a lot but fingerprint reconstruction using Neuro Verifinger was better. Most of the images in the dataset have not good quality, caused by remote uncontrolled acquisition, these examples does not affect to background removal stage but figerphoto enhancement is highly affected.

This research is a preview. For future work, it is necessary to continue working on data acquisition protocols to improve the background removal and achieve a better NFIQ distribution in the data recollected. It is fundamental to be compatible with commercial software and legacy databases, for that finger detection, geometric distortions, and perspective changes need to be solved. Finally, to be aware of NIST actualizations for touchless biometrics.