Keywords

1 Introduction

Face-based biometrics is currently the most studied topic in image-based authentication applications [1]. There are older authentication methods available such as passwords and barcode readers; however, once an unauthorized party has access to the key, it is incredibly challenging to prevent unauthorized access of systems using these authentication methods. Recent research on authentication methods is focused on biometrics such as iris, fingerprint, or face recognition to verify the authenticity of the user. Iris and fingerprint authentication systems have been actively researched and tend to be more accurate than older techniques [2]; such techniques, however, require intentional and active user contact with the device, which might feel unpleasant to the user. Face-based authentication is the most user-friendly and secure method [3], requiring little to no user contact with the device. Face-based authentication, however, is still vulnerable to spoofing, i.e., using photos, videos, or 3D maps of the user to gain unauthorized access [4]. Some systems try to prevent this by acquiring some live face feature, such as yawning [5], but most face-based authentication methods do not provide a built-in anti-spoofing mechanism. In principle, the problem of anti-spoofing is usually treated as an independent problem from face-based authentication.

Different techniques have been proposed in the literature to classify spoofed images of a person, including hardware-based, challenge-response, and software-based anti-spoofing methods. Hardware-based solutions use specific hardware, making it difficult to implement it in simple cameras or smartphones and, while challenge-response methods ask users to perform specific actions for their authentication. However, performing these challenges might feel unpleasant to the user. Software-based methods can be cheap and user-friendly; in the literature, most software-based anti-spoofing methods use gray-scale images to classify among spoofed and non-spoofed faces, capturing images using conventional cameras, thus reducing costs. We extend these methods by combining luminance and chrominance components of color images, and our experiments show that our novel method not only improves detection accuracy but also minimize detection errors.

YCbCr color space allows differentiating between luminance (Y) and chrominance (Cb and Cr) components. In this paper, we propose to extract texture information using Local Binary Patterns (LBP) from both the luminance and chrominance (provided by the YCbCr color space). Gray-scale LBP features and co-occurrence of the LBPs are also used to avoid false positives. These features are combined in a feature matrix, and a binary Support Vector Machine (SVM) classifier is trained using the training set. In the testing phase, these features are computed for each face and concatenated in a feature vector. This feature vector is classified by the already trained classifier as spoofed or non-spoofed (real-person).

The main goal of our method is to ease user interaction with a system, while also increasing security. Current trends in security include two-factor authentication methods [6], but it burdens the user with extra steps, both for the configuration and the usage. In this regard, our method is transparent to the user and allows pleasant user interaction with the system. Our proposed software-based anti-spoofing solution increases security, and also, it can be deployed in real-time systems without incurring any major processing overhead.

For experimental evaluation, we use the NUAA database [7]. We compare our method against other state-of-the-art methods [7,8,9,10] using the following metrics: Attack Presentation Classification Error Rate (APCER), Normal Presentation Classification Error Rate (NPCER), Average Classification Error Rate (ACER), False Positive Rate (FPR), True Positive Rate (TPR), True Negative Rate (TNR) and accuracy. Results show that our method not only presents better results for each metric (lowest APCER, NPCER, ACER, FPR, and highest TPR, TNR and accuracy) but also runs in real-time with an average of 26 frames per second in our experiments, providing high accuracy with little impact to authentication systems.

The paper is organized as follows: Sect. 2 refers to the recent studies on spoofing detection and compares theoretical aspects that motivated this work, Sect. 3 explains the important concepts used in the development of the proposed method. Next, the proposed method is presented in detail in Sect. 4, followed by the experimental evaluation in Sect. 5. Finally, conclusions are drawn in the Sect. 7.

2 Related Works

Software-based spoofing detection motivated many studies. Pioneer solutions to spoofing detection started with a focus on texture-based methods. Li et al. analyzed frequency distribution to differentiate between live and non-live face images [11]. The authors assumed that the photo image has less high-frequency components because of the flat structure of the photo and that the standard deviation of frequency components in the photo image is small because of invariant expressions. These assumptions, however, do not hold for more sophisticated attacks, which include video attacks or 3D map attacks. Anjos et al. created a database and indicated protocols for the evaluation of the spoofing attack solutions [12]. Sun et al. proposed an eye blinking based face liveness detection [13]. Bao et al. used optical flow to detect face liveness [14]; however, the accuracy of the method drops substantially with people using glasses.

Furthermore, iris-based liveness detection methods are not practical because it is not uncommon to use sunglasses in outdoor scenarios. Chingovska et al. utilized Local Binary Patterns to analyze texture to prove liveness [15]; the authors also include alternative LBPs such as transitional LBP (tLBP), direction-coded LBP (dLBP) and modified LBP (mLBP). These LBPs are used as feature vectors and are compared using \(X^2\) histogram classification.

Research on software-based anti-spoofing was usually based on grayscale image analysis until recently. However, some recent works exploited the color properties as vital visual cues for discriminating spoofed faces from the real ones [3, 16]. Boulkenafet et al. proposed a color-based technique that analyses contrast and illumination changes in the captured image or video [3]. LBP is used as a feature vector, and are combined to train a Support Vector Machine (SVM) based classifier. In addition to these features, Co-occurrence of Adjacent Local Binary Patterns (CoALBP), Local Phase Quantization (LPQ), Binarized Statistical Image Features (BSIF) and Scale-Invariant Descriptor (SID) are used as a feature vector for the classifier. This method has high accuracy; however, the feature vector is too big, which makes it slow in training and testing. In our proposed method, we use texture features extracted from multiple color spaces to classify between spoofed and non-spoofed face images. Several methods use luminance information to extract texture features to detect spoofing. However, the chrominance component provides important cues for spoofing detection. Our method extends these methods by combining luminance and chrominance components. It is worth mentioning that LBPs are computed independently in luminance and chrominance component of YCbCr color spaces. Furthermore, gray-scale LBP features and Co-occurrence of the LBPs are also used to avoid false positives. These features are combined as a feature matrix, and SVM based classifier is trained using the training set to classify spoofed face images.

Convolutional Neural Networks (CNN) are also used to face anti-spoofing [17]. Yang et al. trained a CNN with five convolutional (Conv) layers, followed by three fully connected (FC) layers. After learning the CNN, the features from the last connected layer are used to train the SVM classifier. However, in order to use the full potential of neural network based solutions, plenty of pre-processed and labeled training data and hardware capabilities. Furthermore, Kim et al. [8] proposed to use Local Speed Patterns (LSP) based on diffusion speed as a feature vector and to train a linear classifier to detect spoofing. The key idea behind this method is that the difference in the surface properties between live and fake faces can be estimated using the diffusion speed.

In this method, we propose to utilize minimal texture features based on LBPs which are extracted in multiple color spaces. These features tend to be robust and perform real-time. The important building blocks of the proposed method are explained in the next section.

3 Fundamental Concepts

This section elaborates the fundamental concepts used to develop our proposed method. Color spaces are explained in Sect. 3.1, followed by LBP feature extraction in Sect. 3.2, which is used as texture features instead of using all the pixels. Finally, the computation of the Co-occurrence of Adjacent Local binary patterns (CoALBP) is explained in Sect. 3.3.

3.1 Color Spaces - YCbCr and Grayscale

YCbCr is a digital color space that represents the image in three channels, i.e., luminance (Y), chrominance difference blue (Cb) and chrominance difference red (Cr) [18]. The ability of YCbCr to differentiate among luminance and chrominance component makes it easier to handle these components independently. Along with the luminance component, the chrominance components also provide important cues for image analysis. In our proposed method, we use the chrominance and luminance components independently to compute texture features.

Grayscale images also provide important information about the image: it represents the amount of light present in the image, which provides a more meaningful luminance component value than the Y channel of YCbCr color space. For this reason, grayscale images are also used to compute the texture features. In particular, grayscale image is used to compute grayscale LBP and CoALBP features.

3.2 Local Binary Patterns

Local Binary Patterns (LBP) is a local descriptor which generates a binary code for a pixel neighborhood. Figure 1 shows two examples of LBP computation of a center pixel by comparing its intensity with the neighboring pixel intensities. For one pixel, the value of the center pixel is used as a threshold, and the threshold is operated on the eight neighborhood pixels [19]. For LBP, each neighboring pixel value greater than the threshold is assigned 1 and value smaller than the threshold is assigned a value of 0. These values of \(0's\) and \(1's\) are concatenated in a clockwise manner, and the final vector of binary values is converted to integer form, which is the LBP of the central pixel, as can be seen in Fig. 1. The advantages of LBP include its invariance to illumination changes and low computational complexity. Therefore, LBP can be computed at real-time and is capable of discriminating the local texture.

Fig. 1.
figure 1

Example of LBP computation, the binary 01100110 is converted to 24 and neighboring pixels are signed 1 if greater than this value, or 0 otherwise. Image reproduced from [19].

3.3 Co-occurrence of Adjacent Local Binary Patterns

Original LBP features seem to ignore the spatial relationship among the adjacent LBPs, which tends to contain important information about the details of the image [20]. Nosaka et al. [20] computes sparse LBPs and then compute the correlation between the spatially adjacent LBPs in four directions. The directions are defined as: \(A = \{ (\varDelta r,0), (0,\varDelta r), (\varDelta r,\varDelta r),(-\varDelta r,\varDelta r)\}\). For each direction \(a \in A\) a \(16 \times 16\) 2-D histogram is created. The resulting histogram is then reshaped and concatenated to a feature vector. Figure 2 shows the computation of CoALBP in all four directions: the LBP is computed in a 4-neighbor manner instead of 8-neighbor to reduce the computation costs.

Fig. 2.
figure 2

Computation of CoALBP - image reproduced from [20]

4 Proposed Method

Mostly face spoofing is performed using printed targets faces, displaying videos or masks to the input sensors [8]. The simplest attacks, e.g., using mobile phone displays, can be detected easily using texture analysis because of the artifacts in the image. However, higher quality spoofed faces are difficult to detect [3]. Some examples of the real and spoofed images from NUAA database [7] are shown in Fig. 3. Real and spoofed images look very similar, and the task of spoofing detection is not trivial. The distribution of luminance, however, is a bit uniform in spoofed images (as shown in the second row of Fig. 3). This is because human eyes are more sensitive to luminance. Furthermore, chrominance components are important cues for the detection of the recaptured images. For this reason, this work focuses on utilizing both the luminance and chrominance components.

In order to analyze the luminance and chrominance cues for spoofing detection, LBP histograms are computed and tested against the mean of the LBPs from the training data. The training data contains spoofed and non-spoofed face images. Two LBP means are computed: the mean of spoofed images represented by \(\mu _{s}\) is computed by taking the average of all the training spoofed images, and the mean (\(\mu _{c}\)) of non-spoofed images, where c is a representation of client/non-spoofed image. After computing the means, the difference between the test face image and the means is computed to classify spoofing. Chi-square distance is robust for computing the difference between histograms, and is computed using:

$$\begin{aligned} d_{x^2}(H_x ,\mu _{c})=\sum _{i=1}^{n} \frac{(H_x (i)-\mu _{c} (i))^2}{(H_x (i)+\mu _{c} (i))}, d_{x^2}(H_x ,\mu _{s})=\sum _{i=1}^{n} \frac{(H_x (i)-\mu _{s} (i))^2}{(H_x (i)+\mu _{s} (i))}, \end{aligned}$$
(1)

where \(H_x (i)\) represents the \(i^{th}\) bin of the tested face image.

Figure 4a shows the chi-square distances \(d_{x^2}(H_x ,\mu _{c})\) and \(d_{x^2}(H_x ,\mu _{s})\) of the LBP histograms taken from non-spoofed (client or live) faces in the test set, and Fig. 4b shows the chi-square distances \(d_{x^2}(H_x ,\mu _{c})\) and \(d_{x^2}(H_x ,\mu _{s})\) of LBP histogram in grayscale image space of the spoofed faces in the test set. The smaller distance means that it belongs to that particular class. For example, if the distance \(d_{x^2}(H_x ,\mu _{c})\) \(\le \) \(d_{x^2}(H_x ,\mu _{s})\), the image is non-spoofed, otherwise the image is spoofed.

Similarly, the chi-square distances of the chrominance components are shown in Fig. 4c and d. It can be seen that the chrominance component is more robust in the detection of spoofed images than grayscale, while grayscale has better detection of non-spoofed images.

To complement these features, CoALBP is also used as shown in Fig. 5a and b. Finally, the concatenation of these features is shown in Fig. 5c and d. The concatenation of these features looks more promising in discriminating spoofed and non-spoofed faces. In the former, the LBP of non-spoofed faces are closely related to the mean \(\mu _{c}\) LBP of the non-spoofed training faces, while in the latter the mean of spoofed faces is loosely related to the mean \(\mu _{s}\) of the spoofed faces. However, when concatenation of the features is used, the results are improved. This analysis allows our extended method to accurately classify spoofed and non-spoofed faces.

Fig. 3.
figure 3

Example images from NUAA database; Original images in the first row and spoofed images in the second row

Fig. 4.
figure 4

Chi-square distance of the \(\mu _{c}\) and \(\mu _{s}\) with (a) non-spoofed and (b) spoofed face images LBPs in Grayscale; (c) non-spoofed and (d) spoofed face images LBPs in Chrominance (Cr) channel of YCbCr color space.

Fig. 5.
figure 5

Chi-square distance of the \(\mu _{c}\) and \(\mu _{s}\) with (a) non-spoofed and (b) spoofed face images of CoALBPs; (c) non-spoofed and (d) spoofed face images when using concatenated LBPs.

4.1 Feature Extraction

In our proposed method, multiple features are computed and then concatenated to make it robust against the spoofing attacks. We use five LBP based features which are computed in either gray scale or YCbCr color space. Firstly, an RGB image retrieved from the camera is converted into grayscale image (\(I_g\)) and YCbCr image (\(I_yc\)). LBP features are extracted from a gray scale face images and the YCbCr images of the training dataset. Furthermore, grayscale images are also used to compute the Co-occurrence of the Adjacent Local Binary Patterns (CoALBPs), as described in Sect. 3.3. After the computation of these features for each image, each of the feature is reshaped into a vector, and then these feature vectors are combined to one big feature vector. After completing this process for each image, the feature vectors from the whole dataset is combined into a feature matrix of size \(n \times m\), where m is the size of the feature vector and n is the size of the training data.

4.2 Training

After the training matrix is computed, a binary SVM classifier is trained with the labels of the training data to classify the spoofed and non-spoofed images. For SVM training and testing LibSVM library is used [21].

4.3 Classification

When a test image is given, all the features are computed the same way as computed for the training set (as described in Sect. 4.1). After the feature vector is computed, the feature vector is given to the already trained SVM classifier for spoofing detection.

5 Experimental Evaluation

Our proposed method was implemented using Matlab on a PC with a core i7-7500U CPU@2.70 GHz processor, 12 GB of RAM and Windows operating system. For the quantitative evaluations, the following metrics are used to compare the proposed method with the state-of-the-art methods: Attack Presentation Classification Error Rate (APCER), Normal Presentation Classification Error Rate (NPCER), Average Classification Error Rate (ACER), False Positive Rate (FPR), True Positive Rate (TPR), True Negative Rate (TNR) and accuracy.

Attack Presentation Classification Error Rate (APCER):

$$\begin{aligned} APCER = \frac{FP}{TN + FP} \end{aligned}$$
(2)

Normal Presentation Classification Error Rate (NPCER):

$$\begin{aligned} NPCER = \frac{FN}{FN + TP} \end{aligned}$$
(3)

Average Classification Error Rate (ACER):

$$\begin{aligned} ACER = \frac{APCER + NPCER}{2} \end{aligned}$$
(4)

True Positive Rate (TPR):

$$\begin{aligned} TPR = \frac{TP}{TP + FN} \end{aligned}$$
(5)

False Positive Rate (FPR):

$$\begin{aligned} FPR = \frac{FP}{FP + TN} \end{aligned}$$
(6)

Accuracy:

$$\begin{aligned} Accuracy=\frac{TP+TN}{TP+TN+FP+FN} \end{aligned}$$
(7)

The goal of the anti-spoofing algorithm is to achieve smallest value of APCER, NPCER, ACER, FPR and highest TPR, TNR and accuracy values.

5.1 Dataset

For experimental evaluation, the NUAA database was used. The NUAA database was constructed as part of the method called Face Liveness Detection from A Single Image with Sparse Low Rank Bilinear Discriminative Model [7]. It has 11752 original and spoofed face images from 15 subjects, which are divided into training and test sets. The database is recorded using a webcam at 20 fps in varied illumination conditions in a span of two weeks. Spoofing in the NUAA database is done by printing the photos and taking the photos of these photos to re-capture the image. Some examples from the database are shown in Fig. 3. The first row in Fig. 3 shows the original captured images and the second row shows the spoofed images.

6 Quantitative Evaluation

Table 1 shows the results of our proposed method with different sets of features. Our proposed method is compared against the following methods: Total variation models for variable lighting face recognition [9], Face Liveness Detection from a Single Image with Sparse Low Rank Bilinear Discriminative Model [7], Deep Feature Extraction for Face Liveness Detection [10] and Face Liveness Detection From a Single Image via Diffusion Speed Model [8].

In Table 1, the first four rows report the accuracy of three state-of-the-art methods for the NUAA dataset. Each row indicates the results of its respective feature independently, and the last row shows the results of these combined features. For APCER, NPCER, ACER, and FPR smallest value indicate the best result. On the other hand, higher values are expected for TPR, TNR and accuracy. Table 1 shows that our proposed combined LBP method (last row) performs better on all metrices. In terms of accuracy, Kim et al. [8] performs relatively better than other state-of-the-art methods.

The simplest method such as Grayscale LBP performs better than state-of-the art methods based on accuracy, however, grayscale LBP method has some APCER, which can allow some frauds. For this reason, robustness against spoofing attacks is achieved by combining LBP histograms of luminance and chrominance components, along with Co-occurrence of LBP. Our proposed combined LBP method has the smallest APCER and FPR, and thus guarantees robustness against spoofing attacks, and it can be used in real-time with an average of 26 frames per second in our experiments, providing high accuracy with little impact of time to authentication systems.

Table 1. Experimental evaluation of the proposed feature set and comparative methods accuracy (best results in bold, second best results in italic).

7 Conclusions

This work proposes a robust anti-spoofing algorithm that performs in real-time. Our proposed method uses LBP features as texture descriptor in YCbCr color space and grayscale. Furthermore, CoALBP is used to exploit local spatial information. These multiple LBP features are concatenated in one feature vector to represent a face image. The proposed method has two phases: a training phase and a test phase. In the training phase, the concatenated LBPs and CoALBPs of the training images are combined in a feature matrix, that is used to train a binary SVM. In the test phase, the test image’s LBPs and CoALBP are computed and concatenated into a feature vector, which is then classified by an already trained SVM. Experimental results show that our proposed method performs better than state-of-the-art methods. In particular, our proposed concatenated LBP method has the smallest APCER and FPR, and thus guarantees robustness against spoofing attacks. Our proposed anti-spoofing method can be used in real-time environments with an average of 26 frames per second in our experiments, providing high accuracy with little impact of time to authentication systems.