Keywords

1 Introduction

With the rapid development of multimedia and digital imaging technology, it is recently possible to collect large amounts of high-resolution images using sophisticated digital cameras or other high-resolution sensors. Because of the convenience and security of biometrics, biometric recognition techniques have a broad application prospect in the field of identity authentication and network security. There are a variety of biometric authentication methods, including fingerprint, face, and iris, among which fingerprint recognition is the most common. Fingerprints are characterized by uniqueness, stability, and invariability. Moreover, compared with traditional authentication methods, fingerprint recognition requires no password to be remembered. Accordingly, personal identities can be associated with a fingerprint [1]. Afterward, we can verify the authenticity of a user’s identity by comparing his or her fingerprint features with pre-saved features in a database. Fingerprints are claimed to be safe and hard to steal and copy. However, with the increasing demand for high-level security in some smart devices, research points out that these fingerprint authentication devices are unsafe and easily spoofed by some artificial replicas made from wax, moldable plastic, Play-Doh, clay or gelatin [2] when the simulation performance is high enough and the texture is clear enough. Thus, the security of fingerprint recognition systems is threatened. To cope with the above issues, a fingerprint liveness detection (FLD) [3] method has been proposed.

SIFT [4] (Scale-Invariant Feature transform) has the characteristics of invariant rotation, invariant scale and invariant brightness, which are conducive to the effective and efficient expression of target texture information. However, it fails to extract those features from smoothing targets. LBP [5] (Local Binary Pattern) has the advantages of invariant rotation and invariant gray to some extent. However, it does not meet invariant scale. HOG [6] (Histogram of Oriented Gradient) can represent the structural texture information of an edge (gradient), and it can reflect the local shape information. Because of the neglect of the influence of light brightness in the image, the extracted feature adopting HOG contains a lot of noise. In addition, due to the use of block and unit processing ideas, the relationship between the center and adjacent pixels is well represented. In order to describe the subtle differences between genuine or fake fingerprints and take advantage of each feature algorithm, this paper proposes a novel structure for fingerprint liveness detection.

The remainder of this paper is organized as follows. Section 2 describes the related work to FLD in recent years. Section 3 presents the Methodology. Experiments are given in Sect. 4. Finally, conclusions are drawn in Sect. 5.

2 Related Work

In modern times, fingerprint recognition systems are widely deployed in mobile devices, such as tablets, laptops, and smart cellphones. However, one of the common problems with these devices is that they neglect to verify the authenticity of fingerprints before identification. Namely, they do not have the ability to distinguish between genuine and fake fingerprint images [7], which has led to the emergence of FLD technology to solve the problems of spoofing attacks. Researchers and scholars have devoted considerable effort to differentiating genuine fingerprints from fake ones during the past several years by analyzing different physical or psychological traits [7]. Through the research of FLD methods at home and abroad, we note that the existing FLD methods fall into two categories: hardware-based FLD methods and software-based FLD methods. Some recent studies have depicted that anti-spoofing FLD methods based on hardware can discriminate those genuine fingerprints from fake ones only by measuring physiological characteristics, such as pulse oximetry, skin resistance, blood oxygen, temperature, electrocardiogram, etc. Although the above biometrics can recognize genuine and fake fingerprints, the drawback is that these instruments are bulky, expensive and require trained professionals to perform measurements [8]. Hence, to save costs and simplify operations, an ideal anti-spoofing detection method is to exploit as few additional hardware devices as possible and assign all the work to the computer, and some novel software based FLD methods without any additional sensors are proposed. Moreover, late maintenance based on a software detection strategy is also very convenient only via a simple software upgrade.

Texture is an important visual trait that describes the homogeneity phenomenon of the image, and it reflects the arrangement property of the surface structure with slow change. It can reflect different texture phenomenon for those genuine and fake fingerprints, such as morphology, smoothness, and orientation; hence, the texture representation has been used in FLD and is the main content of this paper. At present, how to better represent the texture pattern of the fingerprints and achieve a higher classification accuracy is the hotspot of FLD. Recently, Abhyankar et al. [9] proposed a novel FLD method by combining the multiresolution texture analysis and the interring frequency analysis. They attempted to use different texture information to quantify how the pixel value distribution of the fingerprints change when the physical structure changes. Next, two feature extraction methods, including a multiresolution texture analysis and a ridge-frequency analysis, are utilized. Finally, they exploit the Fuzzy C-means classifier for those combined features to distinguish genuine fingers from fake ones. In 2012, Yuan [10] assessed the recently introduced Local Phase Quantization (LPQ) algorithm and applied it to FLD. A series of experiments based on several common feature extraction algorithms were performed, including LPQ, LBP, LBP’s variants [11], and results also demonstrated that LPQ based detection performance was optimal.

3 Methodology

3.1 Feature Extraction

In order to distinguish the genuine fingerprints from the fake ones, the key is to extract the differentiable features between them. Accordingly, this paper designs a novel fingerprint liveness detection structure analyzing three epidemic feature extractors, including SIFT, LBP, and HOG, to extract subtle differences between them. On one hand, the above three methods can extract some different fine-grained (local) texture features of fingerprints; On the other hand, after feature fusion, the advantages of each algorithm can be fully utilized to make up for the detailed information that cannot be observed by a single feature. Experimental results also demonstrate that the performance of feature fusion is better than that of a single feature extractor.

SIFT: Scale-invariant feature transform, which is a kind of local feature descriptor, can detect the key subtle information between genuine and fake fingerprints. As a stable local feature descriptor, SIFT remains unchanged when these images are rotated and zoomed, even when the intensity changes. First, the image scale is reconstructed using the gray-scale transformation to gain the multi-scale space representation sequences of images, and the main contour of the scale space is extracted from these sequences, which are regarded as a feature vector to realize the extraction of key points in edge and corner detection at different resolutions. Then, to ensure that the detected key points are local extremum points in the scale space and two-dimensional image space, each pixel point is compared with its adjacent pixel points. In addition, the stable extremum points are extracted in different scale space to guarantee the scale invariance of the key points. Moreover, to make the key points invariable to the image angle and rotation, the direction assignment is operated by finding the gradient of each extremum. Finally, the key point descriptor is to generate a unique vector by dividing the pixel area around the key point into blocks, calculating the gradient histogram within the key point. This vector is an abstract representation of the image (Fig. 1).

Fig. 1.
figure 1

Flow chart of fingerprint liveness detection based on multi-modal fine-grained feature fusion.

During the above calculation, the scale space L(x, y, σ) denotes convolutional operation between an original image I(x, y) and a variable-scale 2D Gauss function G(x, y, σ). The two-dimensional Gaussian distribution formula is as follows:

$$ {\text{G}}\left( {x_{i} ,y_{i} } \right) = \frac{1}{{2\pi \sigma^{2} }}{ \exp }\left( { - \frac{{\left( {x - x_{i} } \right)^{2} + (y - y_{i} )^{2} }}{{2\sigma^{2} }}} \right) $$
(1)

The scale space of the image is the calculation of convolution operation using two-dimensional Gaussian distribution and the original image. The scale space expression is as follows:

$$ {\text{L}}\left( {{\text{x}},{\text{y}},\upsigma} \right)\, = \,{\text{G}}\left( {x,y,\upsigma} \right)\, *\,{\text{I}}\left( {x,y} \right) $$
(2)

LBP: LBP [12, 13] is an operator used to describe local texture features of images, and it has the obvious advantages of rotation invariance and gray invariance. The goal is to measure the local contrast of the fingerprints and describe the local texture information of the image.

Before constructing the local texture, we need to preprocess the given image, then transform the image into grey-scale image and analyze the relationship between central pixel and adjacent pixels. The LBP operator is defined in the window of size 3 × 3, and the threshold is the central pixel in the window. Next, the central pixel is compared with its adjacent 8 pixels. If the adjacent pixel is larger than the central pixel, the position of the pixel is marked as 1; otherwise it is marked as 0. In this way, an 8-bit binary number has been generated by comparing with the adjacent 8 points in a window of size 3 × 3 and arranging them in sequence to form a binary number. Take this value as the new value of the pixel in the center of the window to reflect the texture information. It worth nothing that there are 256 kinds of LBP values and each value can represent a different texture.

Because LBP records the difference between the central pixel and its adjacent pixel, when the light changes cause the gray value of the pixel to increase and decrease at the same time, the change in LBP is not obvious. Therefore, it can be considered that LBP is not sensitive to the change of illumination. LBP only detects the texture information of images, so it can further make histogram statistics of LBP which are used as the feature operator of texture analysis. Generally, the image after LBP operation is divided into many square regions, such as 4 × 4, 10 × 10 or 16 × 16, and we can get 16, 100, 256 histograms representing the feature of fingerprint images by means of the above regions.

HOG [14] is short for histogram of oriented gradient, and it consists of local features formed by calculating the gradient histogram of the given images. Since HOG denotes the structural feature of an edge (gradient), it can describe the local shape information; thus, it is a commonly used feature descriptor. The quantization of position and direction space can restrain the influence of translation and rotation to some extent. Moreover, after normalizing the histogram in the local region, the influence of illumination change can be partially offset.

The detailed implementation steps are as follows:

  • Step 1: Before calculation, grayscale and gamma correction are carried on reducing the influence of local shadow and light changes in the image. Meanwhile, to some extent, the interference of noise is suppressed;

  • Step 2: To obtain a histogram of gradient, the horizontal and the vertical gradients of the image by the convolution of the filter and the image are calculated;

  • Step 3: Next, the magnitude and direction of each pixel are calculated;

  • Step 4: After that, each cell consists of 4 × 4 pixels, and the histogram of gradients is computed for each pixel in the cell;

  • Step 5: Before feature generation, to make the generated feature robust to light, shadow and edge changes, it is necessary to normalize the HOG features of the block. Finally, make 4 × 4 cells denote a block and concatenate the features of the block to get the final feature of image, which is employed for subsequent classification.

As shown in Fig. 2, visual images of the genuine and fake feature fingerprints using the HOG method are listed. Among them, the genuine fingerprint features are evenly distributed, the fake fingerprints are damaged more, and there are stains and other fuzzy states.

Fig. 2.
figure 2

True, fake fingerprints and their visualizations using HOG. (a) True fingerprint. (b) True fingerprint visualization. (c) Fake fingerprint. (d) Fake fingerprint visualization.

3.2 Feature Fusion Rule

By analyzing the features extracted using the above three algorithms, the ways of feature fusion are diverse. Thus, to describe the difference between genuine and fake fingerprints, in this paper, we develop a new feature fusion rule to fuse the extracted features. Because the dimensions of extracted features are different, it is difficult to directly splice them. Hence, to piece together these features of different dimensions, we need to make up 0 for the features of the above different dimensions before concatenation. In this paper, five types of different feature fusion rules are set, including an addition operation, maximum operation, minimum operation, average operation and concatenation operation. Table 1 reports the specific operation for each feature fusion rule, where F denotes the feature, the SIFT, LBP and HOG in the subscript are the corresponding features, and the addition operation, maximum operation, minimum operation, average operation and concatenation operation are abbreviated as Add, Max, Min, Ave and Con. The detailed operations are shown in Table 1.

Table 1. Feature fusion rules of different fusion operations.

3.3 Parameter Optimization

After fusing the features using our proposed rule, the generated features will be fed into an SVM (Support Vector Machine) classifier for the subsequent training and testing. However, in order to obtain a better model classifier, it is necessary to perform parameter optimization before model training.

SVM is a learning method based on the criterion of structural risk minimization, which is divided into two categories: linear SVM and nonlinear SVM, depending on the nuclear function used. To eliminate the influence of outliers between features and limit the features to a certain range, a standardization operation should be performed first so as to eliminate the adverse effects caused by outliers’ samples and dimensions. Then, to gain a robust and effective model classifier, optimization of the parameters C and gamma (that is <C, g>) for the fused features, which are constructed via our proposed fusion rule, is necessary. Figure 3 gives the result images of parameter optimization in the Biometrika dataset, and we can obtain the optimal parameter pair (<C, g>) from each figure. Finally, these fused features are trained via an SVM classifier with the optimal parameter pair <C, g>.

Fig. 3.
figure 3

Parameter optimization figures under different feature fusion operation in the Biometrika data set. (a) addition operation; (b) maximum operation; (c) minimum operation; (d) average operation; (e) concatenation operation.

4 Experiments

4.1 Database

The detection performance of our proposed schema is verified using the benchmark fingerprint image set LivDet2013 [1], which consists of a total of 16853 genuine and fake fingerprints and constructed via adoption of four different flat optical sensors, including Biometrika, CrossMatch, Italdata and Swipe. In addition, two types of fingerprints are included: A training dataset with a total of 8450 images and a testing dataset with a total of 8403 images. The training image set is used to learn and obtain a model classifier, and the performance evaluation of the model classifier is evaluated using the testing dataset. Note that it is hard to observe the slight difference between the real fingerprints and the fake ones by the naked eye. The detailed distribution of the LivDet 2013 dataset is shown in Table 2. From Table 2, we can find that the ratio of genuine fingerprints and fake fingerprints is 1:1 approximately, and the sizes/scales of given fingerprints are varied from 315 × 372 to 1500 × 208.

Table 2. The image distribution of the LivDet2013 data set.

4.2 Experimental Process and Performance Evaluation

First, to eliminate the influence of light and other factors for the fingerprints, an image gray processing operation has been performed. Next, the features of the fingerprints are extracted using three feature extraction algorithms, including SIFT, LBP and HOG. The features extracted based on the above three algorithms are only sensitive to some feature changes, but not to other features. Hence, the classification performance of the fingerprint liveness detection adopting a single feature method is unsatisfactory, and the experimental results confirm this point in this paper. To solve the problem, one possible solution is to fuse the features to make up for the shortcomings of a single feature method. Thus, the feature fusion operation, combined with the strengths of each algorithm, can reach the goal of complementary weaknesses. In addition, feature fusion for multiple different algorithms can also enhance final performance.

The specific feature fusion rules involved in this paper are classified into five types: addition operation, maximum operation, minimum operation, average operation and concatenation operation. Due to the difference between the above three algorithms, the dimensions of the feature extracted are inconsistent. To perform successfully the above five feature fusion operations, insufficient parts need to be filled with 0.

Since the distribution and range of each feature are different, it is necessary to map these features extracted to the same interval by normalization operations to make the components of features consistent. Moreover, rescaling to the appropriate range can make training and testing faster. Before gaining a robust model classifier, parameter optimization needs to be carried out to find the optimal parameter pair <C, g> ; the optimal parameter is then exploited for the subsequent model training and testing. Finally, the classification result is obtained by using a trained model classifier.

In order to verify the performance of the feature extraction algorithm in the paper, we adopt average classification error (ACE) [15,16,17] as a metric. The formula is defined as follows:

$$ {\text{ACE}} = \frac{FAR + FRR}{2} $$
(3)

In formula 3, FAR denotes the probability that a fake fingerprint is mistaken as the genuine fingerprint, and FRR is the probability that a genuine fingerprint is mistaken as the fake one. The outcome of fingerprint liveness detection may be any value between 0 and 100. Suppose that the given threshold is 50, and the value of a detected image is more than 50. This image can be recognized as a genuine fingerprint. If not, it will be recognized as a fake fingerprint. Finally, we can obtain the performance of our proposed algorithm by using formula 3.

4.3 Results

First, we analyze and evaluate the performance of our method on the LivDet 2013 dataset when adopting different feature fusion rules, including addition operation, maximum operation, minimum operation, average operation and concatenation operation. The results are reported in Table 3. From Table 3, we can find that, in general, the detection performance after feature fusion is better than that of a single feature algorithm. For example, in the Biometrika dataset, the classification accuracies of SIFT, LBP and HOG are 86.7, 94.0 and 93.8 respectively. After performing the feature fusion operation, the classification accuracy of LBP+HOG is 99.9. Namely, feature fusion can improve the detection performance of genuine and fake fingerprints. In the CrossMatch dataset, the classification results of SIFT, LBP and HOG are 88.8, 90.6 and 90.5, respectively. After performing the feature fusion operation, the classification accuracy of SIFT+LBP is 93.6, and the FLD performance has been further improved. In addition, the time after testing all datasets is also listed in Table 3, and is quite acceptable. Moreover, the task of testing a fingerprint is basically done without our even knowing it, showing that our method is also applicable to real life.

Table 3. Average classification accuracy and testing time of different feature fusions in the LivDet2013 fingerprint set.

Table 4 lists the detailed comparison results when adopting different concatenation operations. In Table 4, the ACEs of our proposed method are the lowest. To compare the performance of different algorithms, the optimal results for each sensor are all highlighted in bold in each row. The result of the Biometrika sensor in LivDet 2013 is close to 0, and the ACE of our method is 0.02 lower than the second result of [21]. The FLD method based on convolutional neural networks (CNN) achieves the state-of-the-art performance in Crossmatch, but it has some drawbacks. For example, the training time of the model is long, the interpretability of the features based on CNN is weak, a large number of training samples are required, and it relies on high-performance computers. However, the texture feature algorithms, which can capture these subtle differences between genuine and fake fingerprints, are used to solve those shortcomings of CNN. In addition, our results of Italdata and Crossmatch are 2.75 and 2.8 higher than the two results of [20, 21] in Table 4, respectively; however, the results of Biometrika and Swipe sensors are 0.7 and 2.55, respectively, lower than that of [20, 21]. Table 3 shows that different feature fusion methods should be used to obtain a better detection performance when the types of fingerprint scanners are known.

Table 4. The comparisons of the ACE of different algorithms in LivDet 2013.

5 Conclusion

It is well known that the SIFT feature descriptor is characterized by invariant rotation, scale and brightness; The HOG feature descriptor ignored the influence of light on the image, reducing the dimension of the feature for the image; The LBP feature descriptor is insensitive to light and fast to operate. Combining the advantages of SIFT features, LBP features, and HOG features can make up for the shortcomings of each algorithm and improve the final detection performance. Finally, these fused features are fed into the SVM classifier for the subsequent training and testing. Contrasted by experiment, the classification performance based on fused features by using SIFT, HOG and LBP is better than other FLD methods, and our method is more suitable for fingerprint liveness detection to prevent spoof attacks related to these artificial replicas.