Keywords

1 Introduction

With the rapidly growing application of face recognition systems in financial transaction and social security, face anti-spoofing, an important technique which aims to protect the identity verification process from presentation attacks, has been receiving increasing attention due to its theoretical significance and practical value. Most conventional face anti-spoofing methods, no matter based on hand-crafted features or spatial and temporal features learned with deep neural networks, assume that the training and testing data have identical and independent distributions (i.e., the i.i.d. assumption). However, this assumption does not hold in many practical applications of face anti-spoofing due to the large variation in image acquisition scenes, capturing devices, and manufacturing processes.

Intuitively, it is extremely difficult, if not impossible, to collect training data which could cover all possible variations of the potential presentation attack in real world applications, and thus the gap between the distribution of training and test data would always exist. Consequently, due to such data discrepancy, directly applying the decision boundary learned with training data to testing images, which are usually unavailable during training, would inevitably produce incorrect classification results and lead to a dramatic performance drop (see Fig. 1(a)). Therefore, improving the generalization performance on unseen testing scenarios is critical for the development of practical face anti-spoofing methods.

Fig. 1.
figure 1

Illustration of (a) the gap between the distribution of training and testing data, (b) problem definition of domain generalization-based face anti-spoofing. P(X) refers to the data distribution of domain X, and (c) explanation of all notations included.

As one of the most effective potential techniques for solving this problem, domain generalization [1] aims to learn a model that could be well-generalized to testing domains unseen during the training process from one or several different but related source domains. Since the goal of improving generalization performance on unseen testing scenarios is largely consistent with that of face anti-spoofing (see Fig. 1(b)), domain generalization was first introduced to this field in [2] and has been extensively explored in subsequent work, which is summarized in Table 1. Due to the remarkable breakthrough made and the great number of studies proposed in domain generalization-based face anti-spoofing, in this paper, we present a comprehensive survey of research in this field to summarize existing approaches and forecast promising future research directions. To the best of our knowledge, this is the first literature review of face anti-spoofing methods focusing on their generalization ability on unseen testing scenarios, where most previous surveys mainly center on discussing conventional methods that do not consider the domain discrepancy.

The rest of this article is organized as follows. In Sect. 2, we review existing methods in this field and present a taxonomy for systematic summarization and comparison. In Sect. 3, we introduce commonly used datasets and evaluation metrics, and also analyze the performance of existing methods to uncover important factors affecting the generalization performance. Afterward, we discuss potential research directions in Sect. 4, aiming to inspire more related work targeting the improvement of the generalization performance of face anti-spoofing in the future. Finally, we draw some conclusions in Sect. 5.

2 Methodologies

In this section, we review existing domain generalization-based face anti-spoofing methods in detail by comparing and analyzing their motivations, highlights, and common technical characteristics. Table 1 gives an overview and a taxonomy of these methods. It is worth noting that all these methods are categorized according to their main innovations.

Table 1. Overview of domain generalization-based face anti-spoofing methods.

2.1 Domain Alignment-Based Methods

Most existing domain generalization-based face anti-spoofing methods fall into this category, where domain-invariant features are learned by minimizing the distribution discrepancies among source domains. Based on the specific technique adopted for aligning the source domains, these methods could be further divided into two sub-categories: maximum mean discrepancy minimizing based approaches and domain adversarial learning based approaches.

Li et al. [2] first introduce domain generalization to face anti-spoofing. They design a regularization term that minimizes the maximum mean discrepancy (MMD) distance among different domains to improve the generalization ability of learned features. Different from [2], most methods [3,4,5,6,7,8,9, 19] are based on the idea of domain adversarial learning [24]. Shao et al. [3] propose an adversarial learning scheme between a shared feature generator and multiple domain discriminators to align the marginal distributions among different domains and learn domain-invariant features. Considering the rich diversity of spoof face images, some work [4, 5, 8, 9] proposes to handle the distribution of live and spoof face images in greater detail. Jia et al. [5] align the distributions of live face images by single-side adversarial learning, and separate the spoof face images of each domain while aggregating the real ones of all domains by an asymmetric triplet loss. Low-rank decomposition is used to extend the work of [5] to improve the robustness of the live and spoof face classifier in [9]. Furthermore, Jiang et al. [8] align the conditional distribution of both live and spoof faces from different domains to learn domain-invariant conditional features. Moreover, [6, 7] propose to perform refined feature alignments by giving different attention during domain adversarial training to samples, features, and regions of images.

Generally, features that are invariant to source domain shift could also be more generalized to the target domain. However, the generalization performance of existing methods on unseen target domains is still difficult to guarantee when the differences between the source-target and source-source domain shift are large. In face anti-spoofing, there are many factors that affect domain discrepancies, and the data on the current training set is usually limited, so this is a problem worthy of further study.

2.2 Meta-Learning-Based Methods

Meta-learning, which is known as learning-to-learn and aims to learn general knowledge from episodes sampled from related tasks, is a commonly used learning strategy to improve the generalization ability of models in domain generalization-based face anti-spoofing. Existing work has been studied from the aspects of feature learning [10, 14, 15], supervision information [11,12,13] and input data [16].

From the perspective of feature learning, episodes are usually built according to unseen testing scenarios of face anti-spoofing. \(M-1\) of the M source domains are used as the meta-train set, and the remaining one is used as the meta-test set to simulate domain shift. After building episodes, Shao et al. [10] and Jia et al. [15] exploit a depth loss and a triplet loss to regularize the optimization process of meta-learners. Liu et al. [14] propose to adaptively select feature normalization methods to learn domain-invariant features. Considering domain labels usually are unknown in application, Chen et al. [11, 13] generate pseudo-domain labels by feature clustering during meta-learning instead of using domain labels. Qin et al. [12] train a teacher to learn how to supervise the live and spoof classifier performing better rather than using handcrafted labels as supervision information. Besides, meta-learning is also used to construct a learnable network to automatically extract generalized input patterns for generalization performance improvement of face anti-spoofing [16].

2.3 Disentangled Representation Learning-Based Methods

It is a challenging task to force all learned features to be domain-invariant in face anti-spoofing due to the diversity of various influencing factors such as identities, spoof faces, acquisition environments, and acquisition devices. Intuitively, aligning part of all learned features that have a large impact on generalization performance and ignoring other features is a more feasible idea, and disentangled representation learning is an effective way to alleviate this problem. Disentangled representation learning-based face anti-spoofing methods generally decompose features into two parts: domain-shared features and domain-specific features, in which domain-shared features are encouraged to be domain-invariant and domain-specific features containing information about various influencing factors are suppressed.

Some work focuses on dealing with the negative impact of a single impact factor on generalization performance. The identity and camera discriminative features are disentangled from liveness features in [17] and [21] to improve the generalization ability of learned features. Other methods take into account a variety of spoof-irrelevant factors such as identity, acquisition environments, and acquisition devices. Kim et al. [19] present a doubly adversarial learning framework to suppress these spoof-irrelevant factors and then enhance the generalization ability on unseen domains for face anti-spoofing. In [18], the learned entire VLAD features are separated into domain-shared and domain-specific features, and only the domain-shared features are enforced to be domain-invariant. Besides, wang et al. [20] disentangle style features from content features, and design a contrastive learning loss to enhance liveness-related style features while suppressing the domain-specific ones.

2.4 Others

In addition to the above three categories, researchers have also tried to improve the generalization performance of face anti-spoofing from other perspectives. Fang et al. [22] extract frequency maps by learnable frequency filters as inputs from a data augmentation aspect. Physical cues such as depth, material, and reflection maps are used to supervise the face anti-spoofing model learning generalized features across domains in [23].

Table 2. Commonly used datasets for domain generalization-based face anti-spoofing.

3 Datasets and Evaluation

Datasets. OULU-NPU, CASIA-MFSD, Replay-Attack, and MSU-MFSD are commonly used datasets in domain generalization-based face anti-spoofing. An overview of the characteristics of these datasets is provided in Table 2. It is clear that these datasets have large differences in acquisition devices, acquisition scenarios, and manufacturing processes of face artifacts, which causes the domain shift among different datasets and further brings challenges to the generalization of face anti-spoofing models. Most domain generalization-based face anti-spoofing methods follow the evaluation protocols in [3] for measuring the performance of benchmark approaches. These protocols are based on the idea of Leave-One-Out, where only one dataset is considered as the unseen target domain and the rest as source domains. Considering the high cost of building a dataset in face anti-spoofing, we hope to learn generalized models using as few source domains as possible. Therefore, the generalization ability of state-of-the-art face anti-spoofing methods is also evaluated on limited source domains.

Although domain generalization-based face anti-spoofing has made great progress in recent years, there is still a gap between the academic experimental setting and the practical application environment, due to the difference in sample quantity and data distribution. To better simulate the practical application scenario, Costa-Paz et al. [20, 29] aggregate more than ten existing face anti-spoofing datasets together and design protocols in terms of acquisition devices, acquisition scenarios, and types of face artifacts to evaluate the generalization performance of face anti-spoofing models.

Table 3. Comparison of state-of-the-art face anti-spoofing methods on four testing sets. ‘O’, ‘C’, ‘I’, and ‘M’ are abbreviations for OULU-NPU, CASIA-MFSD, Replay-Attack, and MSU-MFSD, respectively.
Table 4. Error Rate (%) of different types of unseen attacks on the protocol O &C &I to H, where ‘H’ represents the HQ-WMCA [32] dataset, and ‘O’, ‘C’, and ‘I’ are abbreviations for OULU-NPU, CASIA-MFSD, and Replay-Attack, respectively.

Evaluation Metrics. Most existing methods are both qualitatively and quantitatively evaluated based on previously mentioned protocols. Commonly used quantitative evaluation metrics are Area Under Curve (AUC) ratio and Half Total Error Rate (HTER) [26], where HTER is the average of False Rejection Rate (FRR) and False Acceptance Rate (FAR). Some visualization tools such as t-SNE [33] and Grad-CAM [34] are used to explore the decision-making process of the learned model and qualitatively measure the overall performances.

Performance Analysis. To uncover the important factors affecting the generalization performance of face anti-spoofing models, we conduct extensive experiments on existing methods on four protocols and throughly analyze their performance (shown in Table 3). Generally, over the course of four years, the generalization performance has been significantly improved on all four protocols, although there is still plenty of room for continuous improvement.

Each of the four datasets in the protocols has its own characteristics. In consequence, the domain shift among the four protocols is different. According to Table 3, the state-of-the-art performance for each protocol is achieved by different methods. In other words, existing methods are still incapable of learning completely domain-invariance features for mitigating the gap between domains, and the difference between source-source and source-target domain shift is an important factor affecting the generalization performance. For instance, we regard HQ-WMCA as the target domain and focus on evaluating the impact of domain shift caused by unseen presentation attack instruments. The comparison results on the protocol O &C &I to H are shown in Table 4. Although SSDG-R18 achieves a lower HTER in the protocol O &C &I to M, the model learned from source domains O &C &I still has extremely high error rates for most types of unseen attacks from the HQ-WMCA dataset. Open set domain generalization could be an effective way to solve this problem.

Fig. 2.
figure 2

Performance comparison among different types of methods.

Moreover, we could also conclude that the reduction of source domains leads to a sharp decline in performance. This indicates that both the scale and diversity of training data are important for improving the generalization performance of face anti-spoofing. Relatively speaking, disentangled representation learning and domain alignment-based methods achieve better overall performance than other categories of methods, as shown in Fig. 2. This indicates that suppressing domain-specific features and finely enforcing domain-shared ones to be domain-invariant is an effective way to improve the generalization ability for face anti-spoofing.

4 Future Research Directions

Though lots of efforts have been made on domain generalization-based face anti-spoofing as surveyed, there still remain many open problems. Here we summarize some challenges and future research directions in this field.

  1. 1)

    Small-scale dataset. Most existing face anti-spoofing datasets are small in scale and have limited diversity in the patterns of presentation attacks, and collecting a large-scale dataset covering diverse live and spoof face images is extremely difficult and prohibitively expensive. However, it has been proved that finetuning models trained on large-scale datasets to downstream tasks can improve the generalization ability, and thus large-scale pre-training, self-supervised, semi-supervised, and few-shot domain generalization are worth investigating for face anti-spoofing. Besides, learning to generate novel domains to increase the diversity of source domains would be a promising direction.

  2. 2)

    Interpretability. Face anti-spoofing methods are usually used in scenarios with high-security requirements, but we still lack an understanding of learned features and the decision-making behavior of models. Exploring interpretable domain generalization-based face anti-spoofing methods is an urgent need for practical applications of face anti-spoofing.

  3. 3)

    Heterogeneous approaches. As shown in Sect. 3, most evaluation protocols currently are homogeneous. The source-source domain shift and source-target domain shift are similar. In practical applications, the unseen target domain is unpredictable. The source-target domain shift may be large different from the source-source one. It would be interesting to see more work on datasets, evaluation protocols, and algorithms for heterogeneous domain generalization-based face anti-spoofing.

5 Conclusion

In this paper, we present a comprehensive survey on existing studies in domain generalization-based face anti-spoofing. Specifically, the problem definition, techniques, datasets, evaluation metrics, performance analysis, and future research directions are discussed in detail. We hope this survey could provide a valuable summarization for researchers who are interested in face anti-spoofing and inspire more future work to promote the development of this field.