Keywords

1 Introduction

Face recognition technology (FRT) is a biometric picture capturing tool that’s utilized for either identity verification or to recognize an individual to associate them absolutely to their recorded information [1]. For instance, it is frequently employed at the entrances of airport security checkpoints. Although this particular use has its benefits in terms of improved efficiency, its effectiveness relies on the system’s processing capability and its specific application [2]. Attendance access control is where face recognition technology finds its widest application in terms of its implementation design [3], security [4] and finance, The areas where face recognition technology is utilized include logistics, retail, smartphones, transportation, education, real estate, government administration, entertainment promotion, and network information security [5,6,7] and Additional sectors are starting to incorporate face recognition technology. In the realm of security, both the early detection of suspicious incidents and the tracking of suspects can be effectively carried out with the aid of facial recognition [8]. In the field of face recognition technologies and related technologies, there are several stages of development, three of which will be discussed here. The first stage is the Early Algorithm Stage, which includes Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) [9]. Among these algorithms, PCA is widely recognized as the most commonly used method for reducing data dimensionality [10,11,12,13]. The second stage, known as the Artificial Features and Classifiers Stage, incorporates various techniques such as Support Vector Machine (SVM), Adaboost, Small Samples, and Neural Networks. On the other hand, the final stage, Deep Learning, is a subset of machine learning that has revolutionized the face recognition industry. Unlike previous stages that require feature extraction, deep learning can automatically identify the necessary features for categorization during the training process. This advancement has had a profound impact on the field of face recognition [14]. Convolutional Neural Network (CNN) falls under one of the categories of face recognition technology. CNN incorporates elements such as localized perception areas, shared weights, and downsampling of facial images to enhance the model structure by leveraging the data’s locality and other distinctive characteristics [15]. Some image processing techniques also involve canny edge detection algorithms, quite useful to detect wide range of edges in images [9].

Anti-spoofing refers to the measures taken to counteract spoofing attacks, which involve manipulating data in an attempt to impersonate someone else and gain unauthorized access [16]. The IJCB 2011 competition, which focused on countering 2D facial spoofing attacks, took place recently [17] was a significant group effort for identifying efficient methods for non-intrusive spoofing detection. Multi-modal analysis [18,19,20], challenge-response technique [21], and multispectral imaging [22] all offer effective ways to distinguish between real and fake faces, However, their practicality is limited due to the requirement for user interaction or specialized imaging requirements. Hence, there is a strong desire to incorporate anti-spoofing techniques into existing face authentication systems that eliminate the need for user cooperation and can utilize standard imaging equipment. One key aspect in verifying the authenticity of a face is the detection of eye blinks, and there are various automated methods available for identifying eye blinks in video frames. Typically, the Viola Jones [23] The operator is employed to detect facial features and landmarks, followed by the utilization of adaptive thresholding to calculate the optical flow surrounding the eyes. Ultimately, by employing a correlation matching template for both open and closed eyes, the eye’s motion is estimated.

Since early 2020, the COVID-19 pandemic, triggered by the emergence of the novel SARS-CoV-2 coronavirus, has afflicted the world. Throughout this prolonged period of the pandemic, contactless applications can be implemented by leveraging Face Recognition Technology [24,25,26,27]. Amidst the ongoing pandemic, the utilization of Face Recognition technology proves highly beneficial. It eliminates the need for physical presence of students for authentication and allows teachers to mark attendance without having to touch fingerprint scanners. Thang Long University in Vietnam has taken the initiative to test this technology for attendance purposes in classrooms. Their specific face recognition technology, known as “TLnet,” automatically identifies and records the faces of students in class [28]. Similarly, Vconnex smart home company launched its face recognition smart lock product which involves face recognition login but lacks security of being spoofed [29]. However, these particular application lacks the capability to prevent face spoofing.

Moreover, in the majority of Enterprise Information System institutions in Vietnam, facial recognition and anti-spoofing technology are not implemented, resulting in employees needing their identity cards to check-in instead of using their faces which is not contact less application; therefore, dangerous during pandemic times. After carefully examining these issues, we have put forth our research proposal to develop a comprehensive system. Our aim was to investigate and evaluate suitable methods for facial recognition processing and secure anti-spoofing measures. We utilized a Convolutional Neural Network (CNN) as the core component for building a real-time Facial Recognition application that detects faces, and incorporated Landmark68 for anti-spoofing to determine the authenticity of a face.

During the Covid-19 pandemic, when physical presence was challenging for people to log in with their IDs at the counter or entrance of Enterprise Information System Institutions, we implemented our research findings and created a user-friendly application called AILib. Now, people can log in to AILib using their faces, eliminating the need to be physically present. The system collects user facial data to enhance the accuracy of face recognition.

Based on our research results, the system has demonstrated satisfactory performance, achieving an optimal accuracy level of 98.42%. Furthermore, we discovered that the best threshold value for Asian faces during face recognition testing was 0.4, while different values applied to other face types. For anti-spoofing, the optimal threshold values for left, right, and front faces were found to be d < −50, d < −150, and d > −50, respectively. This algorithm can be practically applied, making a significant contribution to the innovative application of Artificial Intelligence in improving people’s lives, making them safer and more secure.

This paper presents several noteworthy contributions:

  1. a)

    Creation of a comprehensive system: The authors have developed and implemented a robust system called AILib, which combines facial recognition technology with reliable anti-spoofing measures. This innovative solution allows users to log in using their faces, eliminating the need for physical presence. Particularly during times like Covid-19 and Monkeypox outbreaks, this feature proves advantageous.

  2. b)

    Utilization of Convolutional Neural Network (CNN) for face recognition: The authors have successfully incorporated CNN as the main component in their real-time face recognition application. This cutting-edge deep learning approach effectively detects crucial facial features without any human intervention.

  3. c)

    Implementation of Face Landmark/Landmark68 for anti-spoofing: To ensure authenticity and prevent spoofing, the authors have employed the Face Landmark/Landmark68 technique. By analyzing facial landmarks, this system prompts users to perform random actions, making it extremely challenging for fake videos to be used for authentication.

  4. d)

    Determination of optimal threshold values: Through extensive testing and analysis, the authors have identified ideal threshold values for both face recognition and anti-spoofing across different types of faces. For Asian faces specifically, a threshold value of 0.4 was found to be most effective for face recognition. Additionally, values such as d < −50 for left pose, d < −150 for right pose, and d > −50 for front pose were discovered to enhance anti-spoofing measures.

  5. e)

    Practical implementation in real-world scenarios: The proposed system has been successfully implemented and rigorously tested in various real-world environments. These practical demonstrations showcase its potential to greatly improve people’s lives by offering a secure and convenient authentication method.

2 Methodology

The proposed methodology’s architecture as you can see in Fig. 1 comprises several key components. Let’s take a closer look:

Front-end Client:

  • This is a web-based interface that enables user interaction.

  • It captures the user’s face using the camera on their device.

  • The captured face image is then sent to the back-end server for further processing.

Back-end Server:

  • Upon receiving the user’s face image from the front-end client, it begins real-time face detection using the Tiny Face Detector Model.

  • Facial features are extracted from the detected face utilizing a Deep Convolutional Neural Network (CNN).

  • These facial features are encoded into a vector representation, which is then matched with existing face encodings in the database.

  • Additionally, it employs the Face Landmark/Landmark68 approach to detect and locate specific facial points such as eyes, nose, and mouth relative to the overall face structure.

  • As an added security measure against spoofing attempts, users are prompted to perform random facial expressions like smiling or looking left/right.

Fig. 1.
figure 1

Proposed Methodology Architecture

Face Database:

  • This component serves as a repository for storing and managing facial encodings of registered users.

  • During the face recognition process, these encodings are compared for identification purposes.

Anti-Spoofing Verification:

  • To ensure authenticity, this feature measures distances between specific facial points (e.g., point 36 and point 18, point 45 and point 25) in order to detect any noticeable facial movements or changes.

  • It analyzes various aspects of facial expressions and movements to verify user authentication.

AILib Application:

  • This application provides a user-friendly platform for secure logins.

  • Instead of relying on physical presence, users can conveniently log in using their unique facial features.

  • Furthermore, regular collection of user facial data helps enhance accuracy over time.

3 Face Recognition Process

3.1 Face Detection

Face detection is a method used to identify the position and dimensions of a person’s face within a digital image. It is the initial and crucial step in the process of face recognition. In our research, we utilized the Tiny Face Detector Model to achieve real-time face detection [30]. When it comes to clients with limited resources and mobile devices, our preferred face detector is the Tiny Face Detector. It is highly suitable for mobile platforms and web applications due to its exceptional mobility and compatibility. Additionally, in the realm of automated vehicle research for object detection, the Tiny Yolo V2 model has been employed. This model incorporates depth-wise separable convolutions instead of the conventional convolutions used in Yolo [25, 31].

One of the most widely used and well-known DL networks is the Convolutional Neural Network (CNN) [32, 33]. DL’s current popularity can be attributed to CNN, which surpasses its predecessors by autonomously identifying crucial features without the need for human intervention. This ability has made CNN the favored choice and the primary reason behind its widespread adoption. In a variety of fields, such as computer vision [34], audio processing [35], face recognition [36], etc., CNNs have been widely used.

3.2 Face Encoding Process

After receiving the image, the system undergoes the Face Encoding Process, where it analyzes the image, extracts facial features, and represents them in a vector format. This process involves training the system by examining sets of three face images at a time. It generates 128 measurements that capture various facial characteristics such as color, size, slant of eyes, and the distance between eyebrows. To enhance accuracy, slight modifications are made to the neural network, ensuring that the measurements for Image 1 and Image 2 are closer together, while the measurements for Image 2 and Image 3 are further apart. This step is repeated millions of times for millions of images featuring thousands of individuals, allowing the network to consistently generate reliable 128 measurements for each person. Consequently, any set of ten different pictures of the same person should yield the same set of measurements [37].

4 Face Anti-Spoofing Method

4.1 Face Landmark/Landmark68

Facial Landmark refers to the identification of the eye, nose, and mouth’s location relative to the overall facial structure. We will search for the primary points that constitute the object’s shape within an image. This process consists of two steps: 1. Locating the face within the image, and 2. Detecting the facial structures. Although the face contains numerous key points, our focus will be on essential ones, namely the mouth, right eyebrow, left eyebrow, left eye, right eye, nose, and jaw. The system will utilize the “dlib” library as its foundation [38]. As shown in Fig. 2 below, this method will determine 68 key points that follow the (x, y) coordinates.

Fig. 2.
figure 2

68 key points in human face

Fig. 3.
figure 3

Face Change Position

As the human face consists of 68 distinct points, any changes in the position of these points will result in a corresponding change in the distance between them. In our system, we leverage this method to prompt users to smile, look left, and look right. We introduce these requirements randomly to prevent users from creating fake videos. To calculate and identify facial movement, we utilize landmark data obtained through an API based on the “dlib” library’s landmark. This library can detect the flow of 68 key points with (x, y) coordinates that constitute the human face. Figure 3 illustrates the three poses: frontal, left yaw, and right yaw.

Using the face landmark API, we possess the x and y coordinates of 68 essential points on a person’s face, each point having a unique value. By measuring the distance between these points, we can detect facial movement accurately. We use Euclidean distance [39] to calculate the distance between each point.

$$d\left(p,q\right)=\sqrt{{(q1-p1)}^{2}+{(q2-p2)}^{2}}$$

Based on Euclidean distance, we can define when the face looks in front of the camera, turn left and turn right to check that the user in front of camera is a real person.

4.2 Face Expression

Two commonly employed approaches for comprehending human emotions involve the analysis of physical or sensory signals. Physical signals include facial expressions, speech, and gestures, while sensory signals contribute to the expression of six fundamental emotions [40]. We can detect the status of human expressions using landmarks and then use this information for various purposes, one of them is anti-spoofing as well. We will use Euclidean distance formula [39] to calculate distance changing from one point to other on facial landmark data to get the facial expression.

5 Proposed Process of Application

In summary, we utilized the Tiny Face Detector Model to detect faces in real-time. Once an image is received, the system begins analyzing it to extract facial features, transforming them into a vector representation. This involved training a Deep Convolutional Neural Network (DCNN) to generate precise measurements of facial features.

The training dataset consisted of three types of images: two images of the same person, two images of different people, and one image of a completely different person. After training, the network became proficient in generating 128 measurements for each person. The person’s image stored in the database is then compared with the face image sent by the web client.

Additionally, we employed the Face Landmark/Landmark 68 approach to determine the position of the eyes, nose, and mouth relative to the face. This method involves two steps: first, locating the face in the image, and second, detecting the facial features. By determining the (x, y) coordinates, we establish 68 points on the human face. Any changes in these points will result in changes in the distances between them. To measure these distances, we utilized the Euclidean distance formula [39] in order to determine the authenticity of the user in front of the camera, we employ facial expression analysis to observe any changes.

Within our system, we have incorporated this method as an additional measure by randomly requesting the user to smile, look left, or look right. This approach prevents users from being able to falsify these actions in a video. Within our system, as depicted in Fig. 4, the initial step involves the user accessing the checking client. At this stage, the user will be prompted to gaze into the camera. Subsequently, the client will capture an image and transmit it to the back-end server.

Fig. 4.
figure 4

System Use Case Diagram

This image will be passed to convolutional neural the network model is utilized for training purposes and subsequently, the system encodes the input image to extract facial measurements and compares them with the existing face encodings in the dataset. If the input image matches a pre-existing image, it proceeds to an anti-spoofing process. The user is then prompted to smile or perform a random expression, and once the facial expression is captured, the system verifies the anti-spoofing measures and allows successful login into the user’s account on the AILib platform.

6 Results

6.1 Technologies Used

Our system has been specifically designed to incorporate face recognition and anti-spoofing measures. We have implemented face matching in various environments with different lighting conditions. For the front-end and back-end development, we utilized HTML, CSS, and JavaScript to create the web front-end, Python to build the back-end API, Flask as the web framework, and SQLite for storing user information. To enable face-related functionalities such as face detection, landmark68, face expression, and gender recognition, we employed NodeJS along with TensorFlow.js and face-api.js. To ensure accurate face detection and to prevent fake faces, we utilized the Tiny face detector model and landmark68. Additionally, the system underwent two stages of testing: the face recognition test and the face anti-spoofing test.

6.2 The Best Threshold for Application

6.2.1 Face Recognition Threshold Value

To determine the optimal threshold value for face recognition, we conducted two primary steps in a face recognition test: true authentication tests and false acceptance authentication tests. These tests aimed to identify the ideal threshold value and measure the time taken for face recognition. Each individual underwent verification by comparing images captured with various cameras under different lighting conditions. The face recognition time was assessed through 30 trials for each combination of templates and lighting configurations. We tested threshold values of 0.6, 0.55, 0.5, 0.45, and 0.4, repeating each test 100 times to determine the optimal value specifically for Asia Face.

During each trial, the system captured a single frame of face video, loaded the corresponding digital template, searched for a face, and compared it to the existing dataset to find a match. If a match was found, the trial saved the current accuracy before proceeding to the next image in the dataset. The trial concluded once all the subject’s face images were scanned, and the highest accuracy achieved was recorded. If no match was found after scanning all the subject’s images, the trial stopped and returned a success message stating “not found.” In Fig. 4, the system displays the name of the user along with their accuracy and other relevant details when a face match is found.

We performed ten tests using the threshold test case, and for European faces, a tolerance of 0.6 yielded the best results, successfully recognizing individuals with only one to three pictures per subject. However, when applying face recognition to Asia Face, this approach was inaccurate. Table 1 presents the results recorded for face recognition with Asian Face. The system incorrectly recognized the input image when the threshold was set to 0.5 or 0.6. Setting the threshold below 0.5 yielded better results with no incorrect subject identifications. However, in some cases, we were unable to match any subject with a threshold value below 0.45.

Table 1. Detect Asian Face with threshold (W- Wrong, R- Right, NF- Not Found)

We strive to ensure consistency by conducting the tests in the same environment. According to the findings in Table 1, the system with a tolerance of 0.4 is determined to be the most suitable for Asian faces. Figure 5 shows a successful user interface of our research implementation in AILib App.

Fig. 5.
figure 5

System Encoding Face, performing anti-spoofing and displaying result

6.2.2 Face Anti-Spoofing Threshold Value

During the face anti-spoofing test, three digital templates were employed: the left face, the right face, and the smile face. These trials were conducted to assess the distance between various points on the face. Whenever the face position changed during a trial, the system recorded the values of each facial point within the 68-landmark model, including the distances between point 36 and point 18, point 45 and point 25, and point 63 and point 67.

Table 2. Value of d

Based on this, we have conducted calculations to determine the required distance for performing face anti-spoofing. Upon successful spoof checking, the Library Client application appears and displays the recognized user. In our testing, we performed ten trials using the threshold test case, and a tolerance of 0.6 yielded the best results for European faces. For each subject, successful face recognition was achieved with just one to three pictures. However, when it comes to Asian faces, the face recognition results were inaccurate.

In the face anti-spoofing test case, we measured the distance between point 36 and point 18, as well as the distance between point 45 and point 25. We then conducted tests to determine if we could detect the orientation of the face (left or right). As a result of this test case, we obtained three values: “l” represents the value between 36 and 18, “r” represents the value between 45 and 25, and “d” represents the difference between “l” and “r”. After conducting ten tests, we recorded values corresponding to a face perpendicular to the camera (smile), a face looking left, and a face looking right. All these values are presented in Table 2.

After conducting 20 tests, we discovered that a front-facing face has a value of “d” less than −50, a left-facing face has a value of “d” less than −150, and a right-facing face has a value of “d” greater than −50. These results are quite favorable when compared to state-of-the-art methods.

7 Conclusion

Our extensive research aimed to develop a comprehensive system involved meticulous evaluation at each stage, focusing on identifying an appropriate facial recognition processing method and implementing a robust anti-spoofing technique. Our efforts were fruitful, resulting in the creation of the AILib application, which was particularly useful during the Covid-19 pandemic. People can now securely login by logging into AILib using their facial features, eliminating the need for physical presence—a particularly advantageous feature during pandemics such as Covid-19 and Monkeypox.

To enhance the accuracy of face recognition, our system collects user facial data. Based on our research findings, the system exhibits satisfactory performance, achieving an optimal accuracy level of 98.42%. Furthermore, we determined that the ideal threshold value for Asian faces during face recognition tests is 0.4, while different thresholds apply to other facial types. Regarding anti-spoofing, our facial anti-spoofing test identified threshold values of d < −50 for the left pose, d < −150 for the right pose, and d > −50 for the front pose.

Further improvements to our system’s training and face recognition speed can be achieved by utilizing a client machine and a back-end server. This algorithmic solution holds practical applications and contributes significantly to the pioneering field of Artificial Intelligence, thereby promoting a better and safer way of life. Our research presents a valuable opportunity to modernize existing traditional login/authentication systems by providing users with a convenient and secure means of accessing and protecting their data.