Keywords

1 Introduction

Biometrics technology has gained in popularity as a result of the quick expansion of Internet technologies, and it is now extensively used in intelligence protection, criminal proceedings, financial and social stability, clinical training, and other disciplines. The face identification system is more simply accepted by the public than extant biometric identification systems owing to its excellent security, genuineness, and non-contact, and has formed an important research path for academics and industries [1]. The face recognition (FR) technology, on the other hand, is open to malware activity by unauthorized users, posing a serious threat to the system's integrity. As a result, creating a facial anti-spoofing system with higher identification performance, quick response time, and high robustness is critical [2].

The method of determining whether the recently collected facial picture is from a living human or a deceiving face is known as face anti-spoofing (FAS) detection. FAS research has been particularly engaged in recent times both domestically and overseas, owing to its significant academic significance. Printing, video replay and 3D mask attacks are the most popular spoofing assaults. Real and misleading faces have some variations, which are mostly expressed in image texture data, movement details, and perspective details [3]. We can create several FAS systems to identify the actual and counterfeit faces by taking benefit of these distinctions. FAS identification research has progressed fast in recent years, yielding numerous useful research outcomes. This study will examine the methodology based on deep learning (DL), as well as the technique's merits and weaknesses, as well as the FAS development trend.

With DL’s continued advancement and remarkable performances in the field of FR, an increasing number of investigators have used FAS to investigate more comprehensive techniques for combating face deception. DL, as opposed to the old manual feature extraction (FE) technique, may autonomously learn photos, retrieve more critical and plentiful facial features, and assist in effectively distinguishing real from fake faces.

They first suggest a (CNN) [4] to extract features in FAS, which paved the way for a new branch of DL in the field of FAS [5]. The recognition impact was significantly lower than that of conventional approaches because the technologies were not yet established. Furthermore, the superiority of DL in feature extraction prompted a significant amount of research to pursue DL-based FAS. FAS based on DL has progressively advanced through network updates, TL [6], a combination of various characteristics, and domain generality, and has now exceeded the previous technique due to the unwavering dedication and repetitive tries of several researchers [7].

2 Related Work

Despite significant developments in facial recognition systems, face spoofing remains a significant risk. Most academic and corporate FR systems can be fooled by the following: an image, a video, a 3D face model of a genuine user; a reverse-engineered face image from the template of a genuine user; a sketch of a genuine user, etc. We present a quick summary of published facial impersonation recognition techniques. CNN has proven superior to alternative learning frameworks in a variety of computer vision tasks. For facial pictures, a distinctive feature representation approach known as HGC-CNN is employed to identify face spoof attacks with color photos. It's a multi-feature learning system that combines capsule NN and hypergraph regularisation concepts. Capsule NN can incorporate a variety of characteristics, including intensity values, LBP, and picture quality. Hypergraph regularisation can also be employed to learn relationships between samples. The expressive ability of extracted features is improved even more when locality information is included. SVM was utilized in the studies since the new representation is consistent with existing classifiers. The suggested approach outperformed the prior approach on FSA detection with color photos, according to experimental data on the NUAA database and the Multispectral spoofing database [4]. An another approach that combines two CNN streams presented by Yousef Atoum et al. They utilize both the whole-facial image and regions taken from a similar face to differentiate the spoof from live faces, as with most previous methods in face anti-spoofing that only use the entire face to identify presenting attacks. The first CNN streaming is based on the characteristics of patches collected from different face areas. This stream proves to be resistant to all types of presentation attacks, particularly on lower-resolution face photos. The second CNN stream uses the whole facial image to estimate face depth. The outcomes of this CNN’s trials suggest that our depth estimation, especially on higher-resolution images, can produce impressive outcomes [8]. Gene LBPnet, a novel technique for CNN based on LBP for face spoofing detection, is presented by Karuna Grover & Rajesh Mehra. On the NUAA dataset, this methodology outperformed previous state-of-the-art algorithms. Using various assessment parameters, it has been demonstrated that the suggested approach provides excellent accuracy (98%) and a low Equal Error Rate, leading to improved recognition of spoofing attacks and thereby improving system security spoofing attempts [9]. To mutually assess the complexity of face pictures and the rPPG signal of face footage, the suggested system integrates CNN and RNN structures. To discriminate between real and fake faces, the approximated depth and rPPG are combined. They also provide a new FAS database for faces that includes a wide range of lighting, subject, and pose variants. The SiW dataset, which covers more subjects and modifications than previous datasets, is introduced. Lastly, they illustrate the technique's advantage in the experiment [10]. For face liveness identification, Zahid Akhtar et al. propose seven unique strategies for obtaining exclusionary patches in a facial image. A particular classifier is given the properties of specified discriminative picture patches. For the ultimate categorization of authentic and spoof faces, the categorization outcomes of these regions are pooled using a majority-voting-based scheme. In comparison to prior efforts, experiment outcomes on two publically accessible datasets reveal comparable outcomes [11]. To improve the security level of a FAS system, they introduced a novel model for identifying liveness attack images in this article. The variation between the attributes of actual and false faces is taken into account in the approach. As a result, integrating types of image information improves attack effectiveness greatly when compared to using a single approach [12].

3 Proposed Methodology

  • Step 1: Collect the CASIA v2 image dataset which is freely available.

  • Step 2: Cleaning the data and removing the noisy data.

  • Step 3: Identifying and removing noisy images and perform data shuffling.

  • Step 4: Reshaping the data features, and samples and splitting them into training and testing.

  • Step 5: Passing the data into the training model.

  • Step 6: Train and test samples (3331, 833) for fake and real images and split into 70% for training and 30% for testing.

  • Step 7: After completion of training measure performance parameters accuracy, recall and precision (Fig. 1).

    Fig. 1
    figure 1

    Flow chart of proposed methodology

3.1 Dataset Gathering

The suggested method is evaluated using the CASIA v2 picture dataset, which is frequently used to identify image forgery and is freely available. There are 4795 photos in all, with 1701 legitimate and 3274 fake.

3.2 Data Pre-processing

The goal of pre-processing is to optimize graphic data by overwhelming unwanted deformities or improving particular graphic properties that are important for subsequent processing and evaluation.

  1. (a)

    Data cleaning is the act of determining and restoring (or eliminating) corrupted or erroneous information from a record set, table, or database. It includes recognizing insufficient, improper, faulty, or redundant data and then updating, changing, or deleting the dirty or imprecise data.

  2. (b)

    Checking Noisy Images: Image noise is a sort of ambient sound that produces erratic changes in image intensity or color details. The image detector and circuits of a scanner or digital camera can make it. Movie coarse and the inevitable impulse noise of an optimal photoelectron can likewise cause image noise. We can check the original and noisy images in the dataset and convert all the images to error analysis for better performance.

  3. (c)

    Data Shuffling: The shuffling strategies try to jumble up data while retaining logical linkages among columns if desired. It rearranges data from data inside a feature (for example, a column in pure flat format) or a collection of attributes randomly (e.g. a set of columns). Figure 2 shows the original and ELA image.

    Fig. 2
    figure 2

    Figure showing the original and ELA image

3.3 Model Parameter

Figure 3 shows the model parameter and explain is below:

Fig. 3
figure 3

Model parameter

  1. (1)

    Conv2D: Conv2D is a 2-D convolution layer that produces a sequence of results by twisting a convolution kernel with the layers’ data [13].

  2. (2)

    Max-Pooling: Pooling that chooses the largest component from the section of the feature map encompassed by the filters is known as max pooling. As a consequence, the result of the max-pooling layer would be an FM with the most important characteristics of the previous FM [14].

  3. (3)

    Dropout Layer: Dropout is a strategy for avoiding overfitting in a model. At every iteration of the training stage, Dropout consists of setting the outbound edges of hidden nodes (Hidden components are made up of neurons) to 0 [15].

  4. (4)

    Flatten Layer: The process of converting data into a 1D array for usage in the following layer is known as flattening. The CL result is flattened to produce a single long feature representation. It's also related to a fully-connected layer, which is the definitive classification technique [16].

  5. (5)

    Dense Layer: A DL in any NN is tightly linked to the layer before it, indicating that each of the layer's neurons is linked to each of the layer's neurons. It is the most commonly used layer in ANN. The outcome of the DL is an ‘m’ dimensional array. As a consequence, the layer is typically used to change the dimensionality of the vector. The vector is also subjected to processes such as rotation, scale, and translation by these layers [17].

The neural network model is sequentially trained. The employed NN model with layers is shown in Fig. 3. The dataset is separated into the training of 70% and testing of 30% images for fake (3331) and Real (833) images. The NN is trained and used the RELU and Sigmoid as the activation function. The first layer of the network is the conv2D layer with 2432 parameters, after that the max-pooling2D layer is employed proceeding again to conv2D and max-pooling layer. The dropout, flatten and dense layers were then employed in a cascade manner. Table 1 shows the hyper parameters of training where the ADAM optimizer is used with 20 epochs for a batch size of 32.

Table 1 Hyper parameters of training

4 Simulation Result

4.1 Performance Matrix

Precision and accuracy: The degree to which a measured value is near its true value is known as accuracy. Precision refers to how closely all of the measured values are related. To put it another way, accurateness is the proportion of right categories to total classifications.

Recall/Sensitivity: Sensitivity is defined as the proportion of true positives to the whole number of actual positives. Similarly, specificity, also known as the true negative rate, is the proportion of genuine negatives to total negatives [18].

F1-Score: When a model's accuracy is greater than 90%, it is considered to be accurate, we also include the F1 score as a statistic that provides a better indication of cases that have been wrongly classified. The harmonic mean of precision and recall is employed to compute this. When TP and TN are more significant, accuracy is utilized. When the class distribution is unequal and FP and FN are more important, the F1 score is a better statistic [19]. All of the metrics formulas are as shown below.

$$accuracy = \frac{TP + TN}{{TP + TN + FP + FN}}$$
(1)
$$precision = \frac{TP}{{TP + FP}}$$
(2)
$$recall = \frac{TP}{{TP + FN}}$$
(3)
$$F{\text{-}}score = \frac{2}{1/precision + 1/recall}$$
(4)
$$specificity = \frac{TN}{{TN + FP}}$$
(5)
$$sensitivity = \frac{TP}{{TP + FN}} = recall$$
(6)

4.2 Confusion Matrix

In a classification issue, a Confusion Matrix is a tabular representation of prediction outcomes with count values split down by class. It demonstrates how a classification model performs while making predictions, as the name implies. It reveals the types of errors made by the classifier as well as the errors themselves [20]. Better and worse classification results are represented by the points above and below the line, accordingly. The matrix is shown in Fig. 4.

Fig. 4
figure 4

Confusion matrix showing the true and the predicted label

Figure 5 shows the accuracy and loss graphs for the evaluated results. Table 2 shows the comparison of the base and proposed results with the proposed system accuracy of 0.91.

Fig. 5
figure 5

Figure showing the loss and accuracy graph

Table 2 Comparison of the base and the proposed results

The precision, recall, and f1-score of the proposed model are 0.97, 0.85, and 0.91. The fake and real confidence of the resultant images is shown below.

Figures 6, 7 and 8 shows fake and real confidence images for spoofing techniques with duplicate photographs of people whose original images areas maintained in a database. It means that if an intruder wanted access to the authorized system, he or she may have used these several techniques.

Fig. 6
figure 6

An example of a resultant image compared with the original image

Fig. 7
figure 7

An example of a resultant image compared with the original image

Fig. 8
figure 8

An example of a resultant image compared with the original image

5 Conclusion

Face Recognition has become an essential technique for achieving protection as AI has become more widely used in real life. FAS has become a pressing issue in the fight against harmful attacks. The research of face spoofing identification has been continuously monitored and revised, from the starting of manual FE methods based on image texture, image quality, and depth information, to using DL to instantly extract features, merged with network up-gradation, feature assimilation, and domain generalization, and the efficiency and effectiveness of identification have now attained a significant state.