1 Introduction

The growing integration of machine learning models in high-stakes decision systems raises several ethical, legal and philosophical issues. Among them, fairness is often a desired property in addition to being a legal requirement. Machine learning models extract and exploit correlations from their given training data. However, such correlations may not be relevant because of the data collection, processing, sampling, or historical discrimination (Tommasi et al. 2017; Torralba and Efros 2011). To mitigate such negative biases, several fairness notions have emerged (Verma and Rubin 2018). In a nutshell, individual fairness (Dwork et al. 2012; Ignatiev et al. 2020) consists in ensuring that similar individuals receive similar treatment. In contrast, statistical fairness requires that a given metric’s value does not differ between specified subgroups of the population. The key idea here is that individuals should not receive different treatment based on their membership to a protected group. Finally, causal fairness analyzes the relationships between the different attributes and the decision to find (and possibly eliminate) correlations that can be a source of discrimination.

Many methods were proposed in the literature to enhance the fairness of machine learning models (Caton and Haas 2020; Barocas et al. 2019; Ignatiev et al. 2020). However, models that are fair with respect to their training data may still exhibit unfairness when applied to previously unseen data. Indeed, fairness constraint overfitting (Cotter et al. 2018, 2019a) can occur, and fairness generalization has been identified as an open challenge for trustworthy machine learning (Chuang and Mroueh 2021; Cotter et al. 2018, 2019a; Huang and Vishnoi 2019; Mandal et al. 2020). Our objective in this paper is precisely to address this issue.

Recent work on fairness generalization targets integrating different techniques for improving robustness into existing fair learning algorithms. While such methods have been shown (theoretically and empirically) to improve fairness generalization, they often induce a considerable computational overhead (e.g., solving an additional problem to determine a worst-case unfairness (Mandal et al. 2020)), and thus have limited scalability. Some methods do not suffer from this drawback but instead require additional splitting of the data (Cotter et al. 2018, 2019a), hence possibly penalizing utility, as the amount of data used to update the model is reduced. Finally, other approaches have limited applicability, as they are designed for a particular algorithm or hypothesis class (Taskesen et al. 2020; Wang et al. 2021), or require some special property of the underlying algorithm (e.g., access to a cost-sensitive classification oracle (Mandal et al. 2020)). To tackle these issues, we propose a new framework for statistical fairness robustness. Intuitively, our approach consists in ensuring fairness over a variety of samplings of the training set. We show that this notion can be quantified precisely, and leveraged to audit or train fair and robust machine learning models in practice. We additionally design a flexible and efficient heuristic method for learning robust and fair models, which can easily be integrated into existing fair classification methods, formulated as constrained optimization problems. More precisely, our contributions can be summarized as follows.

  • We propose and study a sample-robust formulation of the fair learning problem, based on the Jaccard distance and inspired by Distributionally Robust Optimization (Ben-Tal et al. 2013; Duchi et al. 2021; Rahimian and Mehrotra 2019). The main idea of our method is that we want to meet a given fairness constraint, even if the training set sampling were somehow different (i.e. if some examples were not part of it).

  • We show how this exact formulation can be used to quantify statistical fairness robustness of machine learning models. Our exact method is model-agnostic and can be applied to any type of hypothesis classes.

  • We show that our exact method can be used for sample-robust fair learning and highlight its practical computational and integrability limitations.

  • We design a simple, efficient and flexible heuristic application of our proposed formulation and illustrate its versatility by integrating it into two fair learning algorithms of the literature.

  • We empirically evaluate both our exact and heuristic approaches and compare them on different datasets and statistical fairness metrics.

  • We empirically demonstrate the effectiveness and performance of the proposed heuristic approach on various datasets and metrics and compare it to a state-of-the-art method for improving statistical fairness generalization.

The paper is organized as follows. First, in Sect. 2, we describe the necessary background and review the relevant literature on statistical fairness generalization in machine learning. Afterwards, in Sect. 3, we formulate our notion of sample-robustness for fairness and study its implications and practical limitations. We motivate and introduce a heuristic application of our approach in Sect. 4 and show how to integrate it into state-of-the-art fair learning techniques. Then, we empirically evaluate our proposed approaches in Sect. 5 before concluding in Sect. 6.

2 Background & related work

In this section, we first present a high-level background on supervised machine learning, fairness and distributionally robust optimization. Then, we review existing methods addressing the problem of statistical fairness generalization.

2.1 Preliminaries

2.1.1 Supervised machine learning - classification

Let \({\mathcal {X}}{}\) denote the feature space, \({\mathcal {A}}{}\) the sensitive attributes and \({\mathcal {Y}}{}\) the label set. In addition, \({\mathcal {P}}{}\) will denote the true distribution over \({\mathcal {X}}{} \times {\mathcal {A}}{} \times {\mathcal {Y}}{}\) and \({\mathcal {D}}{}=(X,A,Y)\) a dataset drawn from \({\mathcal {P}}{}\). Given such a dataset and an hypothesis class of models \({\mathcal {H}}{}\), the objective of a supervised learning algorithm \({\mathcal {L}}\) is to build a model \({\mathcal {L}}({\mathcal {D}}{}) = h{} \in {\mathcal {H}}{}\) such that \(h{}\) minimizes a given objective function \(f_{obj}{}\).

For a specific training dataset \({\mathcal {D}}{}\) drawn from some distribution \({\mathcal {P}}{}\), the desired model h is the solution to the following problem, in which \(f_{obj}{}(h, {\mathcal {P}}{})\) is the expected objective function under distribution \({\mathcal {P}}{}\):

$$\begin{aligned} \mathop {{\mathrm{arg\,min}}}\limits _{h \in {\mathcal {H}}}& \quad f_{obj}{}(h, {\mathcal {P}}{}) \end{aligned}$$
(1)

In practice, \({\mathcal {P}}{}\) is often unknown, and we only get a limited number of observations from it, contained in the dataset \({\mathcal {D}}{}\). Then, the optimal solution of Problem (1) is commonly approximated solving Problem (2).

$$\begin{aligned} \mathop {{\mathrm{arg\,min}}}\limits _{h \in {\mathcal {H}}}& \quad f_{obj}{}(h, {\mathcal {D}}{}) \end{aligned}$$
(2)

2.1.2 Fairness in machine learning

In this work, we focus on statistical metrics for fairness. Such metrics aim at equalizing a given statistical measure (e.g., the True Positive Rate) between several (possibly overlapping) protected groups (m being the number of protected groups), defined by the sensitive attributes. For each example \(e_i\) in \({\mathcal {D}}{}\), we denote by \(a_i \in A\) the list of sensitive attributes. Each coordinate \(k \in [1..m]\) of \(a_i\) indicates the membership of example \(e_i\) to the protected group k. Intuitively, the objective is to ensure that examples (e.g., profiles of individuals) receive similar treatment independently of the protected group they belong to. Depending on the particular value being equalized across groups, many metrics have been proposed such as statistical parity (Dwork et al. 2012), predictive equality (Chouldechova 2017), equal opportunity (Hardt et al. 2016) and equalized odds (Hardt et al. 2016). These notions, as well as the statistical measure they equalize, are summarized in Table 1. In this paper, we denote by \(\mathsf {unf}{}(h,{\mathcal {D}}{})\) an oracle quantifying the unfairness of a classifier h over a dataset \({\mathcal {D}}{}\). The value of \(\mathsf {unf}{}(h,{\mathcal {D}}{})\) is in [0, 1]. The lower the value of \(\mathsf {unf}{(h,{\mathcal {D}}{})}\), the more fair is h over \({\mathcal {D}}{}\). In practice, we consider a maximum acceptable unfairness value \(\epsilon \in [0,1]\) (or, equivalently, a minimum acceptable fairness value \(1-\epsilon\)), and say that h is fair over \({\mathcal {D}}{}\) when \(\mathsf {unf}{(h,{\mathcal {D}}{})} \le \epsilon\).

Table 1 Summary of some statistical fairness measures

Several fairness-enhancement algorithms have been proposed in the literature. They can be categorized into three categories, depending on the stage of the machine learning pipeline in which they intervene. Preprocessing techniques (Kamiran and Calders 2012) remove undesired biases from the training data before applying regular learning algorithms on the sanitized dataset. Post-processing algorithms (Hardt et al. 2016) modify the predictions of a (possibly unfair) classifier to achieve fairness. Finally, algorithmic modification (also called in-processing) techniques (Zafar et al. 2017) directly modify the learning algorithm to ensure that the model built is fair. These in-processing approaches naturally define a bi-objective optimization problem: minimizing error while maximizing fairness (or, equivalently, maximizing accuracy while minimizing unfairness). Several methods can then be used to solve this problem, including optimizing directly or indirectly a measure of fairness or enforcing fairness constraints while learning an accurate model. In this paper, we are interested in fair learning methods formulated as constrained optimization problems, as described in Problem (3).

$$\begin{aligned} \mathop {{\mathrm{arg\,min}}}\limits _{h \in {\mathcal {H}}}& \quad f_{obj}{}(h, {\mathcal {D}}{}) \nonumber \\ \text {s.t.}& \quad {\mathsf {unf}{}(h,{\mathcal {D}}{})} \le \epsilon . \end{aligned}$$
(3)

2.1.3 Distributionally robust optimization in supervised machine learning

As stated in Sect. 2.1.1, an important challenge in machine learning is that we usually do not know the true underlying distribution \({\mathcal {P}}{}\). Instead, we often have access to a limited training set \({\mathcal {D}}{}\), whose distribution \({\mathcal {P}}{}'\) may differ from \({\mathcal {P}}{}\). To take into account this uncertainty, Distributionally Robust Optimization (DRO) techniques can be leveraged. Instead of minimizing an objective function \(f_{obj}{}\) for a given distribution \({\mathcal {P}}{}'\), DRO (Ben-Tal et al. 2013; Duchi et al. 2021; Rahimian and Mehrotra 2019) consists in minimizing \(f_{obj}{}\) for a worst-case distribution, among a set of perturbed versions of \({\mathcal {P}}{}'\) (Sagawa et al. 2020). More precisely, the objective is to build a model h minimizing \(f_{obj}{}\) for a set of neighbouring distributions of \({\mathcal {P}}{}'\). Such neighbouring distributions are contained in a perturbation set (also called ambiguity set) \({\mathcal {B}}{}({\mathcal {P}}{}')\). In the DRO setting, the supervised machine learning problem becomes:

$$\begin{aligned} \mathop {{\mathrm{arg\,min}}}\limits _{h \in {\mathcal {H}}}&\max \limits _{{\mathcal {Q}} \in {\mathcal {B}}{}({\mathcal {P}}{}')} f_{obj}{}(h, {\mathcal {Q}}). \end{aligned}$$
(4)

Distributionally Robust Optimization has been used in many domains (Rahimian and Mehrotra 2019), and has been applied widely in machine learning (Kang 2017).

2.2 Related work on improving statistical fairness generalization

To improve the generalization of statistical fairness, several approaches have been designed based on the method proposed by Agarwal et al. (2018), who formulated the problem of learning an accurate classifier under fairness constraints as a two-player zero-sum game. Considering the Lagrangian relaxation of this constrained optimization problem, the first player (\(\theta\)-player) optimizes the model’s parameters for the objective function with current Lagrange multipliers, while the second player (\(\lambda\)-player) approximates the strongest Lagrangian relaxation by updating the Lagrangian multipliers. In their original contribution, Agarwal et al. (2018) analyzed the fairness generalization error of the models trained using this framework. In order to avoid the fairness constraints overfitting, in Cotter et al. (2018, 2019a) the \(\lambda\)-player updates the Lagrangian multipliers based on fairness violations measured on a separate validation set (instead of the training set itself). In Mandal et al. (2020), the \(\lambda\)-player uses linear programming to compute the worst-case fairness violation among a set of re-weightings of the training set. This approach falls into the category of DRO techniques.

Other methods also leverage DRO approaches. For instance in Sagawa et al. (2020), a model is learnt while minimizing the maximum error over a set of protected groups defined by the value of some biased attributes. Several approaches have been proposed to tackle this worst-group error minimization problem. In particular, different methods do not require the full training set protected groups knowledge. Indeed, annotating protected groups membership for each training point can be costly in real-world settings (Duchi et al. 2020; Nam et al. 2020; Liu et al. 2021). Such methods do not reach the performances levels of the standard DRO approach with groups knowledge but constitute interesting alternatives. For example, Nam et al. (2020) and Liu et al. (2021) use two-stage approaches, in which they first train a model before leveraging its errors to train another more robust one. Duchi et al. (2020) applies a DRO technique to approximate and optimize for a worst-case subpopulation above a certain size, without any group annotations.

In Taskesen et al. (2020), distributionally robust and fair logistic regression models are trained by optimizing the fairness-regularized objective function for a worst-case distribution. This most adversarial distribution is considered within an ambiguity set characterized as a Wasserstein distance-based ball around the original training distribution. Rezaei et al. (2020) also leverages the principles of DRO to optimize a robust logarithmic loss under fairness constraints. Their approach uses a minimax formulation, in which a fair predictor minimizes the training loss while a worst-case approximator of the population distribution (subject to statistic-matching constraints) maximizes it. In a similar line of work, Wang et al. (2021) proposes a distributionally robust measure of unfairness for the Equality of Opportunity metric. Robustness is achieved by computing the worst-case unfairness over a set of neighbouring distributions, within a type-\(\infty\) Wasserstein ambiguity set. Taking into account this measure enables the training of distributionally robust fair Support Vector Machines (SVM).

Du and Wu (2021) proposes two algorithms for fair and robust learning under sample selection bias. These two methods aim at estimating the sample selection probabilities, by leveraging (or not) the availability of unlabeled unbiased data. The key point is that knowledge of these biased sample selection probabilities can be used to re-weight the training dataset to make it representative of the true distribution. As an approximation error exists, a minimax approach is used to optimize the objective function for the worst-case sample selection probabilities in a given radius around the estimated ones. The proposed method can only handle the statistical parity metric, which is approximated using decision boundary fairness and included as a regularization term to the objective function. One consequence is that robustness is enforced jointly for error and fairness. Nonetheless, the fairness constraints may not be strictly satisfied.

Measuring prediction stability on the training set, Huang and Vishnoi (2019) proposes the addition of a regularisation term to the objective function of a fair learning algorithm. This regularisation term aims at ensuring that the predictions of the built model do not vary too much when the training dataset is perturbed. In addition, this method theoretically bounds the generalization error. This work is closely related to ours, as we seek to improve fairness robustness on samplings of the training set (which can be viewed as a form of training fairness stability).

In a different line of work, Slack et al. (2020) have studied the scenario in which a model trained to be fair may behave unfairly on related but slightly different tasks. This paper introduces two contributions, namely Fairness Warnings and Fair-MAML. On the one side, Fairness Warnings aims at predicting whether shifts in the features’ distributions may result in violating fairness. This is achieved by generating perturbed versions of the training set (they only consider mean-shifting of the features), measuring the resulting fairness violation and training an interpretable model to predict such violation given the features’ shifts. On the other side, Fair-MAML has for objective to learn a fair model that can be adapted to particular new tasks using minimal (and possibly biased) task-specific data. This is done by adding a fairness regularizer (for either the Statistical Parity or Equal Opportunity metrics) to the loss of the Model Agnostic Meta Learning (MAML) framework.

More recently, Chuang and Mroueh (2021) proposed a data augmentation strategy improving the generalization of fair classifiers. This method leverages existing data augmentation strategies to generate interpolated distributions between two given sensitive groups. During training, a regularisation term penalizes changes in the model’s predictions between the different interpolated distributions. The goal here is to ensure that the model has a smooth behavior along the “path” formed by the interpolated distributions between the two sensitive groups. This approach theoretically and empirically improves the fairness generalization of the models built.

Furthermore, fairness robustness has also been studied in other settings, such as multi-source learning (Iofinova et al. 2021), or for other notions of fairness such as individual fairness (Yurochkin et al. 2020). Both are out of the scope of this paper and thus we do not further detail these approaches.

Finally, the approach that is the more closely related to ours is that of Mandal et al. (2020), which is based on a similar intuition, namely ensuring fairness on a set of neighbouring distributions of the training set, called re-weightings versions, can improve its generalization. However, we consider different definitions for such neighbouring distributions and in addition we propose a heuristic approach variant exhibiting practical advantages compared to the exact one.

3 Sample-based robustness for statistical fairness

In this section, we present our sample-based approach of robustness for statistical fairness. In a nutshell, it aims at improving fairness generalization by enforcing the fairness constraints over particular samplings of the training set. More precisely, we first introduce a dataset sampling technique based on the Jaccard distance, which is used to define our perturbation sets, before characterizing the structure of such perturbation sets. Second, we define the sample-robust fair learning problem on a given perturbation set and show how unfairness metrics increase through neighbouring subsets. Then, we discuss conditions for ensuring perfect fairness sample-robustness and their implications over the resulting models. Afterwards, we introduce an approach to quantify this notion based on the resolution of an integer optimization problem. Finally, we show how to integrate this robustness formulation into existing learning algorithms solving Problem (3) and discuss the practical limitations of the approach.

3.1 Jaccard distance-based perturbation sets

Following the principles of DRO, it has been shown (Mandal et al. 2020; Sagawa et al. 2020; Taskesen et al. 2020; Wang et al. 2021) that enforcing fairness over a set of distributions that are neighbours to the training one is an efficient way to improve its generalization. While DRO was formalized using distributions, practical machine learning applications usually deal with finite training sets that are sampled from an underlying distribution. Indeed, instead of considering fairness robustness over perturbed underlying distributions (which, in practice, are unknown), we enforce robustness with respect to the training set sampling. For this reason, we propose to use the Jaccard distance \(J_\delta {}\) as the distance metric measuring similarity between sample sets.

Definition 1

(Jaccard distance) Let \({\mathcal {D}}{}_1\) and \({\mathcal {D}}{}_2\) be two sample sets. The Jaccard distance between \({\mathcal {D}}{}_1\) and \({\mathcal {D}}{}_2\) is defined as follows: \(J_\delta {}({\mathcal {D}}{}_1,{\mathcal {D}}{}_2) = 1 - \frac{\vert {\mathcal {D}}{}_1 \cap {\mathcal {D}}{}_2 \vert }{\vert {\mathcal {D}}{}_1 \cup {\mathcal {D}}{}_2 \vert }\)

The Jaccard distance is a very popular measure, used to quantify (dis)similarity between sample sets in a wide range of applications. For example, it has been used in Machine Learning for feature ranking stability (Khoshgoftaar et al. 2013; Saeys et al. 2008) and feature selection (Zou et al. 2016). Intuitively, two sample sets \({\mathcal {D}}_1\) and \({\mathcal {D}}_2\) that have a large intersection are close (i.e., \(J_\delta {}({\mathcal {D}}{}_1,{\mathcal {D}}{}_2)\) is small and in particular \(J_\delta {}({\mathcal {D}}{},{\mathcal {D}}{}) = 0\)) while two sample sets \({\mathcal {D}}_3\) and \({\mathcal {D}}_4\) with empty intersection are far from each other (i.e., \(J_\delta {}({\mathcal {D}}{}_3,{\mathcal {D}}{}_4)\) is 1).

We now define the perturbation sets of a given dataset \({\mathcal {D}}{}\) and highlight some consequences.

Definition 2

(Perturbation sets) Let \(d\in [0,1]\), we define a perturbation set \({\mathcal {B}}{}({\mathcal {D}}{}, d)\) as the set of subsets of \({\mathcal {D}}{}\) whose Jaccard distance from \({\mathcal {D}}{}\) is less than or equal to d. That is, \({\mathcal {B}}{}({\mathcal {D}}{}, d) = \{ {\mathcal {D}}{}' \mid J_\delta {}({\mathcal {D}}{},{\mathcal {D}}{}') \le d \wedge ({\mathcal {D}}{}' \subseteq {\mathcal {D}}{}) \}\).

Definition 2 states that \({\mathcal {B}}{}({\mathcal {D}}{}, d)\) contains all subsets of \({\mathcal {D}}{}\) of size at least \(|{\mathcal {D}}{} |\times (1 - d)\). A special case arises if \(d=0\), as \({\mathcal {B}}{}({\mathcal {D}}{}, d)\) is \({\mathcal {D}}{}\) itself. Then, because the perturbation sets defined in Definition 2 contain only subsets of \({\mathcal {D}}{}\), the Jaccard distance between such subsets and \({\mathcal {D}}{}\) is necessarily of the form \(\frac{i}{|{\mathcal {D}}{} |}\), in which i is an integer between 0 and \(|{\mathcal {D}}{} |\) (as the union between \({\mathcal {D}}{}\) and any of its subsets is \({\mathcal {D}}{}\) itself).

Notice that extending a perturbation set consists in adding new subsets of the original training set. An immediate consequence of Definition 2 is that a perturbation set defined with Jaccard distance d is included in perturbation sets with higher distance \(d'\). This is stated in the following proposition:

Proposition 1

(Perturbation sets inclusion) Consider a dataset \({\mathcal {D}}{}\) and two Jaccard distances \(d, d' \in [0,1]\). Then, \(d \le d' \implies {\mathcal {B}}{}({\mathcal {D}}{},d) \subseteq {\mathcal {B}}{}({\mathcal {D}}{},d')\)

Proof

Consider \({\mathcal {D}}{}' \in {\mathcal {B}}{}({\mathcal {D}}{},d)\). Based on Definition 2, \({\mathcal {D}}{}' \subseteq {\mathcal {D}}{}\) and \(J_\delta {}({\mathcal {D}}{},{\mathcal {D}}{}') \le d \le d'\). Hence, \({\mathcal {D}}{}' \in {\mathcal {B}}{}({\mathcal {D}}{},d')\) and \({\mathcal {B}}{}({\mathcal {D}}{},d) \subseteq {\mathcal {B}}{}({\mathcal {D}}{},d')\). \(\square\)

In the following definition, we introduce an additional notation to facilitate the study of the space of our perturbation sets. We then formalize the notion of neighbouring datasets, that will be useful in the remainder of our analysis.

Definition 3

(Sets of equidistant subsets) Let i be an integer between 0 and \(|{\mathcal {D}}{} |\). Then, for any d of the form \(\frac{i}{|{\mathcal {D}}{} |}\), we define \(\Gamma ({\mathcal {D}}{},d)\) as the set of subsets of \({\mathcal {D}}{}\) whose Jaccard distance from \({\mathcal {D}}{}\) is exactly d:

$$\begin{aligned} \Gamma ({\mathcal {D}}{},d) = \{ {\mathcal {D}}{}' ~\mid ~ J_\delta {}({\mathcal {D}}{},{\mathcal {D}}{}') = d ~ \wedge ~ ({\mathcal {D}}{}' \subseteq {\mathcal {D}}{}) \} \end{aligned}$$

As an immediate consequence of Definition 3, \(\{\Gamma ({\mathcal {D}}{},d') \mid d' \le d\}\) constitutes a partition of \({\mathcal {B}}{}({\mathcal {D}}{},d)\).

Definition 4

(Neighbouring datasets) Let \({\mathcal {D}}{}_1\) and \({\mathcal {D}}{}_2\) be two sample sets. \({\mathcal {D}}{}_1\) and \({\mathcal {D}}{}_2\) are called neighbouring datasets (in the Jaccard sense) if and only if \(J_\delta {}({\mathcal {D}}{}_1,{\mathcal {D}}{}_2) = \frac{1}{|{\mathcal {D}}{}_1 \cup {\mathcal {D}}{}_2 |}\). This means that \({\mathcal {D}}{}_1\) and \({\mathcal {D}}{}_2\) differ by exactly one element.

In a nutshell, the subsets of \({\mathcal {D}}{}\) contained in \({\mathcal {B}}{}({\mathcal {D}}{}, d)\) can be seen as points in a metric space equipped with the Jaccard distance,Footnote 1 contained within a ball centered around \({\mathcal {D}}{}\) whose radius is d. This ball is itself contained within all sets \({\mathcal {B}}{}({\mathcal {D}}{}, d')\) with \(d'\ge d\). Again, because we restrict our attention to subsets of the training set, \(\Gamma ({\mathcal {D}}{},d)\) is a sphere centered in \({\mathcal {D}}{}\) with radius d. The ball \({\mathcal {B}}{}({\mathcal {D}}{}, d)\) thus includes all spheres \(\Gamma ({\mathcal {D}}{},d')\) with radius \(d' \le d\).

An interesting representation of our perturbation sets space can be done using a nearest neighbours (in the Jaccard sense) graph. In such graph, each vertex represents a subset of the training set, and each edge between two vertices means that their associated sets are neighbours. Figure 1 uses such representation to illustrate our perturbation sets structure on a toy dataset with two protected groups for the statistical fairness measure. In the remainder of the paper, the perturbation sets are considered for insuring fairness constraints. Thus in this graph at least one example of each protected group must be present in each subset.Footnote 2 Based on this representation, we could also derive a directed graph \(\mathcal {G'}\) in which each edge from \({\mathcal {D}}{}_1\) to \({\mathcal {D}}{}_2\) means that \({\mathcal {D}}{}_2 \subseteq {\mathcal {D}}{}_1\), thus representing a superset relationship.

Fig. 1
figure 1

Example of perturbation sets for a dataset \({\mathcal {D}}{}\) with 5 examples and two protected groups a (\(\{e_1, e_2, e_3\}\)) and b (\(\{e_4, e_5\}\)). Subsets that can not be used to audit a model’s fairness with respect to protected groups (a) and (b) are not represented

We now formulate a recursive definition of \({\mathcal {B}}{}({\mathcal {D}}{},d)\). First, observe that the smallest perturbation set \({\mathcal {B}}{}({\mathcal {D}}{},d)\) not restricted to \({\mathcal {D}}{}\) itself is \({\mathcal {B}}{}({\mathcal {D}}{}, \frac{1}{|{\mathcal {D}}{} |})\). It contains all subsets of \({\mathcal {D}}{}\) formed by removing at most one example from \({\mathcal {D}}{}\). This is a particular case of Proposition 2, which generalizes this observation.

Proposition 2

(Perturbation sets structure) Consider a dataset \({\mathcal {D}}{}\) and a Jaccard distance \(d \in [0,1-\frac{1}{|{\mathcal {D}}{} |}]\). We can formulate recursive definitions for \(\Gamma ({\mathcal {D}}{},d)\) and \({\mathcal {B}}{}({\mathcal {D}}{}, d)\) as follows:

  • \(\Gamma ({\mathcal {D}}{},d+\frac{1}{|{\mathcal {D}}{} |}) = \{{\mathcal {D}}{}'' \mid \exists {\mathcal {D}}{}' \in \Gamma ({\mathcal {D}}{},d) \mid {\mathcal {D}}{}'' \subset {\mathcal {D}}{}' \wedge |{\mathcal {D}}{}'' |= |{\mathcal {D}}{}' |-1\}\)

  • \({\mathcal {B}}{}({\mathcal {D}}{}, d+\frac{1}{|{\mathcal {D}}{} |}) = {\mathcal {B}}{}({\mathcal {D}}{}, d) \cup \Gamma ({\mathcal {D}}{},d+\frac{1}{|{\mathcal {D}}{} |})\)

Proof

  • By construction following the definition of \(\Gamma ({\mathcal {D}}{},d+\frac{1}{|{\mathcal {D}}{} |})\) (Definition 3).

  • Definition 2 states that \({\mathcal {B}}{}({\mathcal {D}}{}, d)\) contains all subsets of \({\mathcal {D}}{}\) up to a Jaccard distance d. Definition 3 states that \(\Gamma ({\mathcal {D}}{},d+\frac{1}{|{\mathcal {D}}{} |})\) contains all subsets of \({\mathcal {D}}{}\) whose Jaccard distance from \({\mathcal {D}}{}\) is exactly \(d+\frac{1}{|{\mathcal {D}}{} |}\). Thus, the union of these two sets define exactly \({\mathcal {B}}{}({\mathcal {D}}{}, d+\frac{1}{|{\mathcal {D}}{} |})\), which, according to Definition 2, contains all subsets of \({\mathcal {D}}{}\) up to a Jaccard distance of \(d+\frac{1}{|{\mathcal {D}}{} |}\).

\(\square\)

Proposition 2 states that the smallest set that is strictly bigger than \({\mathcal {B}}{}({\mathcal {D}}{}, d)\) is \({\mathcal {B}}{}({\mathcal {D}}{}, d + \frac{1}{|{\mathcal {D}}{} |})\). By construction, it contains \({\mathcal {B}}{}({\mathcal {D}}{}, d)\). The sets outside \({\mathcal {B}}{}({\mathcal {D}}{}, d)\) and inside \({\mathcal {B}}{}({\mathcal {D}}{}, d + \frac{1}{|{\mathcal {D}}{} |})\) are exactly those in \(\Gamma ({\mathcal {D}}{},d+\frac{1}{|{\mathcal {D}}{} |})\). The later includes all sets formed by removing exactly one element from the (smallest) sets in \({\mathcal {B}}{}({\mathcal {D}}{}, d)\) (which form \(\Gamma ({\mathcal {D}}{},d)\)). Again, this can be visualized in Fig. 1.

One may remark that instead of considering only subsets of the training set, we could take into account all neighbouring sets (as stated in Definition 4). This would require considering that examples can be added to our subsets. Even though this formulation can seem theoretically appealing, it does not have the interesting structure that we studied in this section and quantifying it is computationally harder as the denominator of the Jaccard distance would no longer be a constant.

3.2 Sample-robust fair learning with our perturbation sets

Similar to DRO, the proposed approach consists in ensuring a given property (e.g., fairness) over a set of elements contained in a perturbation set. For DRO, such elements are distributions while we rather consider sample sets. By doing so, our objective is to improve the fairness generalization on unseen data. Hence, our perturbation sets contain different samplings of the original training set. Thus, we propose to study a sample-robust fair learning problem to reach fairness generalization.

We formulate our sample-robust fair learning problem before characterizing the evolution of unfairness through the considered subsets. Then, we investigate the conditions and implications for perfectly satisfying our proposed fairness sample-robustness criterion.

3.2.1 Robust fair learning for a given perturbation set

By considering the perturbation set \({\mathcal {B}}{}({\mathcal {D}}{}, d)\) as a set of samplings of the dataset \({\mathcal {D}}{}\), we aim at building a model that is fair on all sets of \({\mathcal {B}}{}({\mathcal {D}}{}, d)\), including \({\mathcal {D}}{}\) itself. This leads to the formulation of our sample-robust version of Problem (3).

The sample-robust fair learning problem on a perturbation set \({\mathcal {B}}{}({\mathcal {D}}{}, d)\) is formulated as follows:

$$\begin{aligned} \mathop {{\mathrm{arg\,min}}}\limits _{h \in {\mathcal {H}}}& \quad f_{obj}{}(h, {\mathcal {D}}{}) \nonumber \\ \text {s.t.}& \quad \underset{{\mathcal {D}}{}'\in {\mathcal {B}}{}({\mathcal {D}}{}, d)}{\max } {\mathsf {unf}{}(h,{\mathcal {D}}{}')} \le \epsilon . \end{aligned}$$
(5)

This formulation is a particular instantiation of the general DRO formulation of Problem (4), in which robustness is applied only on the enforced fairness constraints rather than on the objective function. An optimal solution to Problem (5) corresponds to a model \(h{}\) that minimizes the objective function \(f_{obj}{}\) on \({\mathcal {D}}{}\), among those of \({\mathcal {H}}{}\) that exhibit unfairness at most \(\epsilon\) over all sets contained in \({\mathcal {B}}{}({\mathcal {D}}{}, d)\), including \({\mathcal {D}}{}\) itself.

With the proposed perturbation sets definition, we observe that augmenting the distance d increases the number of subsets being considered, as stated in Proposition 1. As a consequence, considering higher values of d can only raise the worst-case fairness violation, thus hardening the problem. This is formalized in Proposition 3. Hence, the parameter d directly controls the strength of the enforced robustness of the fairness constraint.

Proposition 3

(Worst-case fairness violation monotonicity with respect to d) Consider a dataset \({\mathcal {D}}{}\) and a classifier h.

$$\begin{aligned} \forall d,d' \in [0,1], d \le d' \implies \underset{{\mathcal {D}}{}' \in {\mathcal {B}}{}({\mathcal {D}}{}, d)}{\max } {\mathsf {unf}{}(h,{\mathcal {D}}{}')} \le \underset{{\mathcal {D}}{}'' \in {\mathcal {B}}{}({\mathcal {D}}{}, d')}{\max } {\mathsf {unf}{}(h,{\mathcal {D}}{}'')}. \end{aligned}$$

Proof

According to Proposition 1, if \(d \le d'\) then \({\mathcal {B}}{}({\mathcal {D}}{},d) \subseteq {\mathcal {B}}{}({\mathcal {D}}{},d')\). Thus, the maximum unfairness over all sets in \({\mathcal {B}}{}({\mathcal {D}}{},d)\) is less than or equal to the maximum unfairness over all sets in \({\mathcal {B}}{}({\mathcal {D}}{}, d')\). \(\square\)

3.2.2 Unfairness increase through neighbouring subsets

In order to characterize the possible increase of fairness violation (i.e., the strength of the fairness constraints applied in Problem (5)) induced by increasing the size of the considered perturbation set, we first need to introduce the notion of sensitivity, which is directly taken from the differential privacy framework (Dwork and Roth 2014) in Definition 5.

Definition 5

(Unfairness \(l_1\)-sensitivity) The unfairness measure \(l_1\)-sensitivity \(\gamma\) quantifies the maximum contribution of a single example to the unfairness value of any classifier, for any pair of neighbouring datasets:

$$\begin{aligned} \gamma&= \underset{\begin{array}{c} h \in {\mathcal {H}}\\ {\mathcal {D}}_1{},{\mathcal {D}}_2{}\\ J_\delta {}({\mathcal {D}}{}_1,{\mathcal {D}}{}_2)= \frac{1}{|{\mathcal {D}}{}_1 \cup {\mathcal {D}}{}_2 |} \end{array}}{\max } |\mathsf {unf}{}(h,{\mathcal {D}}_2{}) - \mathsf {unf}{}(h,{\mathcal {D}}_1{}) |\end{aligned}$$

The increase of the worst-case fairness violation induced by extending a perturbation set \({\mathcal {B}}{}({\mathcal {D}}{}, d)\) to the next one \({\mathcal {B}}{}({\mathcal {D}}{}, d+\frac{1}{|{\mathcal {D}}{} |})\) can then be upper bounded as shown in the next proposition. This is due to the bounded sensitivity (as stated in Definition 5) of the unfairness measure \(\mathsf {unf}{}(.)\) at hand, which has been proved for common statistical fairness metrics in the context of differentially private and fair learning (Cummings et al. 2019). The formalization of the bound goes as follows.

Proposition 4

(Bounded worst-case fairness violation increase between consecutive perturbation sets (general case)) Consider a dataset \({\mathcal {D}}{}\) and a classifier h. Let \(\gamma\) be the \(l_1\)-sensitivity of the unfairness measure and a Jaccard distance \(d \in [0,1-\frac{1}{|{\mathcal {D}}{} |}]\), we have:

$$\begin{aligned} \underset{{\mathcal {D}}{}' \in {\mathcal {B}}{}({\mathcal {D}}{}, d)}{\max } {\mathsf {unf}{}(h,{\mathcal {D}}{}')} \le \underset{{\mathcal {D}}{}'' \in {\mathcal {B}}{}({\mathcal {D}}{}, d+\frac{1}{|{\mathcal {D}}{} |})}{\max } {\mathsf {unf}{}(h,{\mathcal {D}}{}'')} \le \underset{{\mathcal {D}}{}' \in {\mathcal {B}}{}({\mathcal {D}}{}, d)}{\max } {\mathsf {unf}{}(h,{\mathcal {D}}{}')} + \gamma . \end{aligned}$$

Proof

  • Left inequality: Follows from Proposition 3 and the fact that \(d \le d + \frac{1}{|{\mathcal {D}}{} |}\).

  • Right inequality: Consider \({\mathcal {D}}{}'' \in {\mathcal {B}}{}({\mathcal {D}}{}, d+\frac{1}{|{\mathcal {D}}{} |})\). By Proposition 2, we know that \({\mathcal {D}}{}''\) is either in \({\mathcal {B}}{}({\mathcal {D}}{}, d)\) or in \(\Gamma ({\mathcal {D}}{},d+\frac{1}{|{\mathcal {D}}{} |})\).

    • In the first case, \({\mathcal {D}}{}'' \in {\mathcal {B}}{}({\mathcal {D}}{}, d)\). The maximum unfairness measure across the perturbation set is not worsened and we have:

      $$\begin{aligned} \mathsf {unf}{}(h,{\mathcal {D}}{}'') \le \underset{{\mathcal {D}}{}' \in {\mathcal {B}}{}({\mathcal {D}}{}, d)}{\max } {\mathsf {unf}{}(h,{\mathcal {D}}{}')} \end{aligned}$$
    • In the second case, \({\mathcal {D}}{}'' \in \Gamma ({\mathcal {D}}{},d+\frac{1}{|{\mathcal {D}}{} |})\). By definition of \(\Gamma\), we know that there exists some set \({\mathcal {D}}{}' \in {\mathcal {B}}{}({\mathcal {D}}{}, d)\) such that \({\mathcal {D}}{}''\) is formed by removing exactly one element from \({\mathcal {D}}{}'\). Hence, \({\mathcal {D}}{}'\) and \({\mathcal {D}}{}''\) are neighbouring datasets, and following Definition 5, we know that:

      $$\begin{aligned} |\mathsf {unf}{}(h,{\mathcal {D}}{}'') - \mathsf {unf}{}(h,{\mathcal {D}}{}') |\le \gamma&\implies \mathsf {unf}{}(h,{\mathcal {D}}{}'') \le \mathsf {unf}{}(h,{\mathcal {D}}{}')+ \gamma \\&\implies \mathsf {unf}{}(h,{\mathcal {D}}{}'') \le \underset{{\mathcal {D}}{}' \in {\mathcal {B}}{}({\mathcal {D}}{}, d)}{\max } {\mathsf {unf}{}(h,{\mathcal {D}}{}')} + \gamma . \end{aligned}$$

\(\square\)

This property can be visualized using the Jaccard neighbouring graph of Fig. 1. It states that extending the perturbation set \({\mathcal {B}}{}({\mathcal {D}}{}, d)\) by adding one more layer of the graph (the sets contained in \(\Gamma ({\mathcal {D}}{},d+\frac{1}{|{\mathcal {D}}{} |})\)) cannot worsen the worst-case fairness violation by more than \(\gamma\). This is due to the fact that the unfairness measure cannot be increased by more than \(\gamma\) between any two neighbouring datasets (i.e., represented by two connected vertices in the Jaccard neighbours graph).

While Proposition 4 does not rely on any specific fairness formulation, depending on the metric at hand, tight values of \(\gamma\) can be computed. For instance, considering common statistical fairness metrics, we demonstrate that an exact finite value of \(\gamma\) can be computed for each subset, given the classifier’s predictions. To ease the readibility, hereafter we consider the binary sensitive attribute setting in which there are only two non-overlapping protected groups a and b. However, note that the generalisation of the following proposition is valid for any number of protected groups, by considering all pairs of protected groups, as the resulting unfairness can be measured as the pairwise maximum unfairness.

Common statistical unfairness metrics can be defined using expressions of the form \(\mathsf {unf}{}(h,{\mathcal {D}}{}_1) = |\frac{S_a^ {{\mathcal {D}}{}_1}}{X_a^ {{\mathcal {D}}{}_1}} - \frac{S_b^ {{\mathcal {D}}{}_1}}{X_b^ {{\mathcal {D}}{}_1}} |\), in which \(\frac{S_i^ {{\mathcal {D}}{}_1}}{X_i^ {{\mathcal {D}}{}_1}}\) is the chosen statistical measure of classifier h for group \(i \in \{a,b\}\). As it will always be clear from context, we do not include index h in the notations S and X.

For instance, for the Statistical Parity metric, \(S_i^ {{\mathcal {D}}{}_1}\) (respectively \(X_i^{ {\mathcal {D}}{}_1}\)) is the number of positive predictions (respectively number of examples) among group i in the dataset \({\mathcal {D}}{}_1\), given classifier h’s predictions. For the Equal Opportunity metric, \(S_i^ {{\mathcal {D}}{}_1}\) (respectively \(X_i^{ {\mathcal {D}}{}_1}\)) is the number of true positive predictions (respectively number of positively labeled examples) among group i, for the dataset \({\mathcal {D}}{}_1\), given classifier h’s predictions.

While the particular measures of \(S_i^{ {\mathcal {D}}{}_1}\) and \(X_i^{ {\mathcal {D}}{}_1}\) depend on the fairness metric at hand, our proposed formulation covers all common statistical fairness metrics (in particular, those considered in this paper and listed in Table 1). Additionally, we always have \(S_i^ {{\mathcal {D}}{}} \le X_i^ {{\mathcal {D}}{}}\), because the examples counted within \(S_i^ {{\mathcal {D}}{}}\) correspond to a subset of those counted within \(X_i^ {{\mathcal {D}}{}}\) that satisfy a given condition.

We assume without loss of generality that \(\frac{S_a^ {{\mathcal {D}}{}_1}}{X_a^ {{\mathcal {D}}{}_1}} > \frac{S_b^ {{\mathcal {D}}{}_1}}{X_b^ {{\mathcal {D}}{}_1}}\). We also assume \(X_a^ {{\mathcal {D}}{}_1} > 0\) and \(X_b^ {{\mathcal {D}}{}_1} > 0\), which means that the (sub)set \({\mathcal {D}}{}_1\) contains examples from both protected groups that can be used to quantify fairness. Otherwise, \({\mathcal {D}}{}_1\) cannot be used to audit a model’s fairness with respect to protected groups a and b. An exact value of \(\gamma\) can then be obtained, that depends on the particular (sub)set \({\mathcal {D}}{}_1\) considered. Indeed, common statistical fairness metrics’ sensitivity are data-dependent, as discussed in Cummings et al. (2019) in the context of differentially-private fair learning. By a slight abuse of notation, we define \(\gamma (h, {\mathcal {D}}{}_1)\) as the local \(l_1\)-sensitivity (as opposed to the global sensitivity \(\gamma\)) of the unfairness measure, given a classifier h, considering a dataset \({\mathcal {D}}_1{}\) and any neighbouring dataset \({\mathcal {D}}{}_2\) (see Definition 4) such that \({\mathcal {D}}{}_2 \subseteq {\mathcal {D}}{}_1\). In other words, \(\gamma (h, {\mathcal {D}}{}_1)\) quantifies the maximum unfairness increase made possible by removing at most one element from \({\mathcal {D}}{}_1\). It upper bounds the fairness violation increase between \({\mathcal {D}}{}_1\) and any of its neighbours \({\mathcal {D}}{}_2\) such that \({\mathcal {D}}{}_2 \subseteq {\mathcal {D}}{}_1\):

$$\begin{aligned} \gamma (h, {\mathcal {D}}{}_1)&=\underset{\begin{array}{c} {\mathcal {D}}{}_2 \subseteq {\mathcal {D}}{}_1\\ J_\delta {}({\mathcal {D}}{}_1,{\mathcal {D}}{}_2)= \frac{1}{|{\mathcal {D}}{}_1 \cup {\mathcal {D}}{}_2 |} \end{array}}{\max } ||\mathsf {unf}{}(h,{\mathcal {D}}_2{}) - \mathsf {unf}{}(h,{\mathcal {D}}_1{}) ||_1 \\&= \underset{\begin{array}{c} {\mathcal {D}}{}_2 \subseteq {\mathcal {D}}{}_1\\ J_\delta {}({\mathcal {D}}{}_1,{\mathcal {D}}{}_2)= \frac{1}{|{\mathcal {D}}{}_1 \cup {\mathcal {D}}{}_2 |} \end{array}}{\max } |\mathsf {unf}{}(h,{\mathcal {D}}_2{}) - \mathsf {unf}{}(h,{\mathcal {D}}_1{}) |\end{aligned}$$

We can illustrate this notion with Fig. 1. On one hand, the global sensitivity \(\gamma\) (Proposition 4) was previously able to upper bound the increase of the fairness violation among all edges of the graph. On the other hand, by leveraging the statistical fairness metrics’ formulation, we can compute the (tighter) value of the local sensitivity \(\gamma (h, {\mathcal {D}}{}_1)\), upper-bounding the unfairness increase through all edges outgoing the vertex associated to \({\mathcal {D}}{}_1\), in \(\mathcal {G'}\) (the directed graph derived from Fig. 1, where each edge from \({\mathcal {D}}{}_1\) to \({\mathcal {D}}{}_2\) means that \({\mathcal {D}}{}_2 \subseteq {\mathcal {D}}{}_1\)).

Proposition 5

(Bounded worst-case fairness violation increase between consecutive perturbation sets (statistical fairness metrics)) Consider a dataset \({\mathcal {D}}{}\), a classifier h and a Jaccard distance \(d \in [0,1-\frac{1}{|{\mathcal {D}}{} |}]\). We have:

  1. 1.

    Given a subset \({\mathcal {D}}{}'\) of \({\mathcal {D}}{}\), the value of \(\gamma (h, {\mathcal {D}}{}')\) can be computed explicitly and has finite value.

  2. 2.

    \(\underset{{\mathcal {D}}{}'' \in {\mathcal {B}}{}({\mathcal {D}}{}, d+\frac{1}{|{\mathcal {D}}{} |})}{\max }{\mathsf {unf}{}(h,{\mathcal {D}}{}'')} \le \underset{{\mathcal {D}}{}' \in {\mathcal {B}}{}({\mathcal {D}}{}, d)}{\max }{\mathsf {unf}{}(h,{\mathcal {D}}{}')+ \gamma (h, {\mathcal {D}}{}')}\).

  3. 3.

    \(\exists {\mathcal {D}}{}'' \in {\mathcal {B}}{}({\mathcal {D}}{}, d+\frac{1}{|{\mathcal {D}}{} |})\) such that \(\mathsf {unf}{}(h,{\mathcal {D}}{}'') = \underset{{\mathcal {D}}{}' \in {\mathcal {B}}{}({\mathcal {D}}{}, d)}{\max }{\mathsf {unf}{}(h,{\mathcal {D}}{}') + \gamma (h, {\mathcal {D}}{}')}\).

Proof

We sketch the proof here and give the details in Appendix A. In this Proposition, \(\gamma (h, {\mathcal {D}}{}')\) is the maximum increase of the unfairness measure made possible by removing at most one example from \({\mathcal {D}}{}'\). For statistical metrics, \(\mathsf {unf}{}(h,{\mathcal {D}}{}') = |\frac{S_a^ {{\mathcal {D}}{}'}}{X_a^ {{\mathcal {D}}{}'}} - \frac{S_b^ {{\mathcal {D}}{}'}}{X_b^ {{\mathcal {D}}{}'}} |\) and we observe that there are exactly four ways of modifying \(\mathsf {unf}{}(h,{\mathcal {D}}{}')\) : removing an example of group a satisfying or not satisfying the measure, or removing an example of group b satisfying or not satisfying the measure. Note that this makes the complexity of computing \(\gamma (h,{\mathcal {D}}')\) independent of \(|{\mathcal {D}}' |\) (effectively resulting in \(\gamma (h,{\mathcal {D}}')\) being computed in constant time for any \({\mathcal {D}}'\)). \(\square\)

Proposition 5 shows that the worst-case unfairness increase induced by extending our perturbation set with minimal change can be bounded and that this upper-bound can be reached (i.e., the corresponding subset can be build by carefully selecting the element to be removed from \({\mathcal {D}}{}'\)). The analysis of this bound (explicitly stated in Appendix A) demonstrates that the higher both groups’ size are, the smoother \(\mathsf {unf}{}(.)\) is. This observation naturally holds for any value of the Jaccard distance d, theoretically highlighting the advantage of working with sufficiently large protected groups. Indeed, too small protected groups cause unfairness to have higher sensitivity, hence being less stable to punctual changes in the data.

3.2.3 Conditions for perfect statistical fairness metrics sample-robustness

One important implication of Proposition 5 is that the worst-case fairness violation increase induced by minimal extension of any perturbation set is, in the general case, greater than 0 and can be computed exactly. Hence, it is in general not possible for a classifier h to be perfectly fair over all our perturbation sets. Formally, we say that h achieves perfect fairness sample-robustness given unfairness tolerance \(\epsilon\) if and only if \(\forall d \in [0,1], \forall {\mathcal {D}}{}' \in {\mathcal {B}}{}({\mathcal {D}}{}, d),~ \mathsf {unf}{}(h,{\mathcal {D}}{}') \le \epsilon\). In other terms, h achieves perfect fairness sample-robustness on \({\mathcal {D}}{}\) if and only if it meets the desired fairness constraint over all possible subsets of \({\mathcal {D}}{}\). In the general case, for common statistical fairness metrics, it is not possible for h to be fair for all \(0<d\le 1\). Indeed, it is possible to build a subset of \({\mathcal {D}}{}\) on which unfairness is exactly 1.0 (hence violating any fairness constraint \(\epsilon < 1\)). Given the values of \(S_i^{\mathcal {D}}{}\) and \(X_i^{\mathcal {D}}{}\) for all protected groups i (here, \(i \in \{a,b\}\)), we can easily check whether the corresponding classifier h verifies perfect fairness sample robustness without building any subsets. The following proposition gives necessary and sufficient conditions that both imply the impossibility for h to satisfy perfect fairness sample robustness.

Proposition 6

(Necessary and sufficient conditions for perfect fairness sample-robustness infeasibility) Consider a dataset \({\mathcal {D}}{}\) and a classifier h, as well as a maximum acceptable unfairness \(\epsilon < 1\). Perfect fairness sample-robustness of h on \({\mathcal {D}}{}\) is infeasible, that is \(\exists {\mathcal {D}}{}' \subseteq {\mathcal {D}}{}\) such that \(\mathsf {unf}{}(h,{\mathcal {D}}{}') = 1.0\), if and only if one of the following two conditions holds:

  1. 1.

    \(S_a^{\mathcal {D}}{} > 0\) and \(S_b^{\mathcal {D}}{}<X_b^{\mathcal {D}}{}\). In this case, \({\mathcal {D}}{}' \in {\mathcal {B}}{}({\mathcal {D}}{}, d)\), for all \(d \ge \frac{1}{|{\mathcal {D}}{} |} (X_a^{\mathcal {D}}{}-S_a^{\mathcal {D}}{}+S_b^{\mathcal {D}}{})\).

  2. 2.

    \(S_b^{\mathcal {D}}{} > 0\) and \(S_a^{\mathcal {D}}{}<X_a^{\mathcal {D}}{}\). In this case, \({\mathcal {D}}{}' \in {\mathcal {B}}{}({\mathcal {D}}{}, d)\), for all \(d \ge \frac{1}{|{\mathcal {D}}{} |} (X_b^{\mathcal {D}}{}-S_b^{\mathcal {D}}{}+S_a^{\mathcal {D}}{})\).

Proof

The gist of the proof is that we can, by removing a sufficiently important number of examples from \({\mathcal {D}}{}\), get a subset \({\mathcal {D}}{}'\) which exhibits the worst possible unfairness \(\mathsf {unf}{}(h,{\mathcal {D}}{}')=1.0\). Indeed, to reach this value, one of the two rate measures \(\frac{S_a^{{\mathcal {D}}{}'}}{X_a^{{\mathcal {D}}{}'}}\) or \(\frac{S_b^{{\mathcal {D}}{}'}}{X_b^{{\mathcal {D}}{}'}}\) has to be brought to 0 while the other reaches 1. To achieve value 0 for the ratio \(\frac{S_i^{{\mathcal {D}}{}'}}{X_i^{{\mathcal {D}}{}'}}\), we need to remove \(S_i^{{\mathcal {D}}{}}\) instances from \({\mathcal {D}}{}\) (those satisfying the measure), and to reach value 1.0, we must remove \(X_i^{{\mathcal {D}}{}} - S_i^{{\mathcal {D}}{}}\) examples (those not satisfying the measure). Finally, we can either remove \(X_a^{{\mathcal {D}}{}}-S_a^{{\mathcal {D}}{}}+S_b^{{\mathcal {D}}{}}\) or \(X_b^{{\mathcal {D}}{}}-S_b^{{\mathcal {D}}{}}+S_a^{{\mathcal {D}}{}}\) carefully chosen examples from \({\mathcal {D}}{}\). In Case 1, we bring the a-ratio to 1 and the b-ratio to 0. In Case 2, it is the contrary. If neither 1 nor 2 can be applied, then necessarily either \(S_i^{\mathcal {D}}{} = 0\) for all protected groups or \(S_i^{\mathcal {D}}{} = X_i^{\mathcal {D}}{}\) for all protected groups. In that case, perfect fairness sample-robustness is achieved (which is discussed later in Proposition 8). \(\square\)

Proposition 6 shows that a minimum finite value of d can be easily computed to define the smallest \({\mathcal {B}}{}({\mathcal {D}}{}, d)\) containing a subset for which unfairness is exactly 1.0 (i.e., fairness is 0.0 - the worst possible value). This illustrates the fact that considering too large values of d may not make sense and that the constraints defined in Problem (5) are very strong. For example, for the Statistical Parity metric, this implies that any classifier with non-constant prediction across protected groups a and b (hence, \(X_a^{\mathcal {D}}{}> S_a^{\mathcal {D}}{} > 0\) or \(X_b^{\mathcal {D}}{}> S_b^{\mathcal {D}}{} >0\) for this metric) may be exactly fair on \({\mathcal {D}}{}\) but exactly unfair for some subset of \({\mathcal {D}}{}\) contained in \({\mathcal {B}}{}({\mathcal {D}}{}, d)\). Additionally, Proposition 6 shows how this value of d can be computed. Based on the principles of this proposition, we infer in Proposition 7 a simple, yet powerful, sufficient condition for perfect fairness sample-robustness infeasibility based on the unfairness measure over \({\mathcal {D}}{}\).

Proposition 7

(Sufficient condition for perfect fairness sample-robustness infeasibility) Consider a dataset \({\mathcal {D}}{}\) and a classifier h. If \(\mathsf {unf}{}(h,{\mathcal {D}}{}) > 0\) then perfect fairness sample-robustness of h on \({\mathcal {D}}{}\) is infeasible for any maximum acceptable unfairness value \(\epsilon < 1\).

Proof

Assume that \(\mathsf {unf}{}(h,{\mathcal {D}}{}) > 0\), which means that \(|\frac{S_a^ {\mathcal {D}}{}}{X_a^ {\mathcal {D}}{}} - \frac{S_b^ {\mathcal {D}}{}}{X_b^ {\mathcal {D}}{}} |> 0\), hence either \(S_a^{\mathcal {D}}{} > 0\) or \(S_b^{\mathcal {D}}{} > 0\) (or both). We also observe that either \(S_a^{\mathcal {D}}{} < X_a^{\mathcal {D}}{}\) or \(S_b^{\mathcal {D}}{} < X_b^{\mathcal {D}}{}\) (or both). If \(S_a^{\mathcal {D}}{} > 0\), two cases are possible:

  • On the one hand, it may be that \(S_a^{\mathcal {D}}{} = X_a^{\mathcal {D}}{}\). Then, necessarily, \(S_b^{\mathcal {D}}{} < X_b^{\mathcal {D}}{}\) and Case 1 of Proposition 6 is applicable.

  • On the other hand, \(S_a^{\mathcal {D}}{} < X_a^{\mathcal {D}}{}\)

    • If \(S_b^{\mathcal {D}}{}=X_b^{\mathcal {D}}{}\), then Case 2 of Proposition 6 is applicable.

    • If \(S_b^{\mathcal {D}}{}=0\), then Case 1 of Proposition 6 is applicable.

    • If \(0<S_b^{\mathcal {D}}{}<X_b^{\mathcal {D}}{}\) then either Case 1 or Case 2 of Proposition 6 are applicable.

A similar disjunction can be conducted over possible values of \(S_b^{\mathcal {D}}{}\). Observe that if \(\mathsf {unf}{}(h,{\mathcal {D}}{})=0\) then we can not conclude without looking at \(S_a^{\mathcal {D}}{}\) and \(S_b^{\mathcal {D}}{}\). \(\square\)

We now formulate necessary and sufficient conditions for guaranteeing perfect fairness sample-robustness.

Proposition 8

(Necessary and sufficient conditions for perfect fairness sample-robustness) Consider a dataset \({\mathcal {D}}{}\) and a classifier h as well as a maximum acceptable unfairness \(\epsilon < 1\). Perfect fairness sample-robustness is guaranteed if and only if one of the following conditions holds:

  1. 1.

    \(S_i^{\mathcal {D}}{} = X_i^{\mathcal {D}}{}\) for all protected groups i (here, \(i \in \{a,b\}\)).

  2. 2.

    \(S_i^{\mathcal {D}}{} = 0\) for all protected groups i (here, \(i \in \{a,b\}\)).

Proof

If \(S_i^{\mathcal {D}}{} = X_i^{\mathcal {D}}{}\) for all protected groups i, then the ratio associated to all protected groups is exactly 1.0 and cannot be modified by removing examples. Similarly, if \(S_i^{\mathcal {D}}{} = 0\) for all protected groups i, then the ratio associated to all protected groups is exactly 0.0 and cannot be modified by removing examples. Finally, in any other case, either Points 1 or 2 of Proposition 6 can be applied to prove the infeasibility of perfect fairness sample-robustness. \(\square\)

The different possible cases mentioned in Propositions 6 and 8 are summarized in Fig. 2. Observe that, as expected, the conditions established in Proposition 8 are equivalent to the negation of those formulated in Proposition 6.

Fig. 2
figure 2

The different possible situations to establish perfect fairness sample-robustness (Yes) or its impossibility (No)

3.2.4 Implications of perfect statistical fairness metrics sample-robustness

Based on the conditions established in Proposition 8, we now study the implications for h of being exactly fair for all our perturbation sets for statistical fairness metrics.

Consider a dataset \({\mathcal {D}}{}\) and a classifier h satisfying perfect fairness sample-robustness. On the one hand, it means that h is perfectly fair on \({\mathcal {D}}{}\) and all its subsets, which can be desirable. However, depending on the fairness metric at hand, such a model may not be interesting as we show below.

3.2.4.1 Statistical Parity

For the Statistical Parity metric, it means that h’s predictions are constant for all instances of groups a and b. This conflict strongly with utility and may result in h being a trivial model.

3.2.4.2 Predictive Equality

For the Predictive Equality metric, it means that either all negative samples of groups a and b are well classified (True Negative Rates are 1.0), or they are all misclassified (False Positive Rates are 1.0). Hence, a perfectly robust-fair model for this metric would either be 100% accurate over negative samples, or 100% inaccurate over such examples. Observe that the first case is desirable, but also easily reachable by a trivial classifier constantly predicting the negative class.

3.2.4.3 Equal Opportunity

For the Equal Opportunity metric, it means that either all positive samples of groups a and b are well classified (True Positive Rates are 1.0) or they are all misclassified (False Negative Rates are 1.0). Hence, a perfectly robust-fair model for this metric would also be either 100% accurate over positive samples, or 100% inaccurate over such examples. Observe that the first case is desirable, but also easily reachable by a trivial classifier constantly predicting the positive class.

3.2.4.4 Equalized Odds

The Equalized Odds metric is the conjunction of Predictive Equality and Equal Opportunity. Hence, a perfectly robust-fair model for this metric would be 100% accurate over its training set (or 100% inaccurate over its training set - in which case inverting its predictions would be sufficient).

These results illustrate the strength of our robustness notion. However, they also suggest that a direct application may not be possible. Indeed, a training accuracy of 100% cannot be reached in general. In addition, it may not be desirable to reach such accuracy as it usually indicates overfitting. In the next subsection, we show how our framework can be used to quantify the sample-robustness of statistical fairness.

3.3 Maximal perturbation set ensuring statistical fairness constraint

We showed in Proposition 8 that in the special situations in which both \(S_a^{\mathcal {D}}{} = 0\) and \(S_b^{\mathcal {D}}{} = 0\), or both \(S_a^{\mathcal {D}}{} = X_a^{\mathcal {D}}{}\) and \(S_b^{\mathcal {D}}{} = X_b^{\mathcal {D}}{}\), h is perfectly fair over all the perturbation sets defined with respect to \({\mathcal {D}}{}\). However, as discussed earlier, perfect fairness sample-robustness may not be desirable, nor achievable.

Instead of trying to enforce perfect sample-robustness of statistical fairness for a classifier, we will use our new framework to quantify a classifier’s fairness sample-robustness, as defined in the following definition.

Definition 6

(Quantifying sample-robustness for fairness) Consider a dataset \({\mathcal {D}}{}\), a classifier h and an acceptable unfairness tolerance \(\epsilon\). The unfairness sample-robustness of h on \({\mathcal {D}}{}\) for constraint \(\epsilon\), denoted by \(\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\), is the Jaccard distance (\(\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon ) \in [0,1]\)) such that:

  1. 1.

    \(\forall d \ge \mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon ), \exists {\mathcal {D}}{}' \in {\mathcal {B}}{}({\mathcal {D}}{}, d)\) such that \(\mathsf {unf}{}(h,{\mathcal {D}}{}') > \epsilon\).

  2. 2.

    \(\forall d < \mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon ), \forall {\mathcal {D}}{}' \in {\mathcal {B}}{}({\mathcal {D}}{}, d), \mathsf {unf}{}(h,{\mathcal {D}}{}') \le \epsilon\).

In other words, \(\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\) is the largest possible value of the Jaccard distance d such that h is fair over all sets in \({\mathcal {B}}{}({\mathcal {D}}{}, d'),\forall d' < d\).

Consider that \({\mathcal {D}}{}\) and all its subsets are points into a metric space equipped with the Jaccard distance. Intuitively, \(\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\) is the radius of the largest ball centered around \({\mathcal {D}}{}\) such that h if fair over all sample sets strictly contained within this ball. In simple words, h is fair on \({\mathcal {D}}{}\) and on subsets of \({\mathcal {D}}{}\) up to a (Jaccard) distance of \(\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\). The bigger \(\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\), the more sample-robust h’s fairness is. The a- and b-ratios evolve non-linearly. Hence, it is not possible to compute a simple bound as in Proposition 6 for values of \(\epsilon\) such that \(0< \epsilon < 1\). Therefore, we propose to consider a simple constrained optimization problem, denoted by \(\mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon )\), to compute the minimal number of examples that need to be removed from \({\mathcal {D}}{}\) to build a subset of examples such that h is not \(\epsilon\)-fair over it. Note, however, that the bounds proposed in Proposition 6 still hold, but are not tight (i.e., using these bounds we get an over-estimation of \(\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\)).

Definition 7

(The integer program for quantifying sample-robustness for fairness) A solution of \(\mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon )\) is a tuple \((x_a,x_b,y_a,y_b)\), in which these four decision variables represent the number of examples to be removed from \({\mathcal {D}}{}\) to form a subset on which the unfairness constraint is violated.

More precisely, \(x_i\) represents the number of examples of group i satisfying the given measure (hence counted within both \(S_i^{\mathcal {D}}{}\) and \(X_i^{\mathcal {D}}{}\)), while \(y_i\) represents the number of examples of group i not satisfying the given measure (hence counted only within \(X_i^{\mathcal {D}}{}\)). The optimal solution of \(\mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon )\) is the one minimizing the total number of examples to be removed (7) to build the closest (in the Jaccard sense) subset of \({\mathcal {D}}{}\).

$$\begin{aligned} \mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon ):&\end{aligned}$$
(6)
$$\begin{aligned} \min _{x_a,x_b,y_a,y_b}&\quad x_a+x_b+y_a+y_b \end{aligned}$$
(7)
$$\begin{aligned} \text {s.t.~}&\quad \left|\frac{S_a^ {\mathcal {D}}{}-x_a}{X_a^ {\mathcal {D}}{}-x_a-y_a} - \frac{S_b^ {\mathcal {D}}{}-x_b}{X_b^ {\mathcal {D}}{}-x_b-y_b} \right|> \epsilon \end{aligned}$$
(8)
$$\begin{aligned}&\quad 0 \le x_a \le S_a^{\mathcal {D}}{} \end{aligned}$$
(9)
$$\begin{aligned}&\quad 0 \le x_b \le S_b^{\mathcal {D}}{} \end{aligned}$$
(10)
$$\begin{aligned}&\quad 0 \le y_a \le X_a^{\mathcal {D}}{}-S_a^{\mathcal {D}}{} \end{aligned}$$
(11)
$$\begin{aligned}&\quad 0 \le y_b \le X_b^{\mathcal {D}}{}-S_b^{\mathcal {D}}{} \end{aligned}$$
(12)
$$\begin{aligned}&\quad x_a+y_a < X_a^{\mathcal {D}}{} \end{aligned}$$
(13)
$$\begin{aligned}&\quad x_b+y_b < X_b^{\mathcal {D}}{} . \end{aligned}$$
(14)

Constraint (8) encodes the fact that the fairness constraint must be violated on the resulting subset. Constraints (9) to (12) capture the variables’ domains. Finally, constraints (13) and (14) enforce that at least one example of each group is kept (otherwise unfairness is undefined).

Illustration of \(\mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon )\) for an example metric For the Equal Opportunity metric, recall that \(S_i^{\mathcal {D}}{}\) is the number of positively labelled examples belonging to group i that are positively predicted by h (true positives). For this metric, \(X_i^{\mathcal {D}}{}\) is the total number of positively labelled examples belonging to group i. Then, \(x_i\) represents the number of examples removed from \({\mathcal {D}}{}\) that belong to group i and are positively labelled and positively predicted by h. Removing \(x_i\) such examples decrements both \(S_i^{\mathcal {D}}{}\) and \(X_i^{\mathcal {D}}{}\). On the other side, \(y_i\) is the number of examples removed from \({\mathcal {D}}{}\) that belong to group i and are positively labelled and negatively predicted by h. Removing \(y_i\) such examples decrements only \(X_i^{\mathcal {D}}{}\).

In the next proposition, we show that \(\mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon )\) can be used to exactly compute a classifier’s fairness sample-robustness \(\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\).

Proposition 9

(Quantifying Sample-Robustness for fairness using \(\mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon )\)) Let \((x_a^*, x_b^*, y_a^*,y_b^*)\) be the optimal solution of \(\mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon )\). Then:

$$\begin{aligned} \mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon ) = \frac{x_a^*+x_b^*+y_a^*+y_b^*}{|{\mathcal {D}}{} |}. \end{aligned}$$

Proof

To prove this equality, we will need to prove the two conditions of Definition 6.

Let \(z^*=x_a^*+x_b^*+y_a^*+y_b^*\) be the value of the objective function of the optimal solution of \(\mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon )\). We define \(d^* = \frac{z^*}{|{\mathcal {D}}{} |} = \frac{x_a^* + x_b^* + y_a^* + y_b^*}{|{\mathcal {D}}{} |}\). Then:

  1. 1.

    Consider \({\mathcal {D}}{}^*\), the subset of \({\mathcal {D}}{}\) formed by removing \(x_a^*\) (respectively \(x_b^*\)) examples of group a (respectively b) satisfying the statistical criterion, and \(y_a^*\) (respectively \(y_b^*\)) examples of group a (respectively b) not satisfying the statistical criterion. The bounds of the decision variables of Problem (6) enforce that \({\mathcal {D}}{}^*\) exists. We have: \(J_\delta ({{\mathcal {D}}{},{\mathcal {D}}{}^*)} = \frac{x_a^* + x_b^* + y_a^* + y_b^*}{|{\mathcal {D}}{} |}=d^*\). Additionally, we know that \(\mathsf {unf}{}(h,{\mathcal {D}}{}^*) > \epsilon\), because \((x_a^*, x_b^*, y_a^*,y_b^*)\) is a solution of \(\mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon )\) and then necessarily satisfies Constraint (8). Hence, \(\forall d \ge d^*, \exists {\mathcal {D}}{}'={\mathcal {D}}{}^* \in {\mathcal {B}}{}({\mathcal {D}}{}, d)\) such that \(\mathsf {unf}{}(h,{\mathcal {D}}{}') > \epsilon\).

  2. 2.

    Assume that \(\exists {\mathcal {D}}{}'' \in {\mathcal {B}}{}({\mathcal {D}}{}, d)\) with \(d < d^*\) such that \(\mathsf {unf}{}(h,{\mathcal {D}}{}'') > \epsilon\). Then, \({\mathcal {D}}{}''\) is formed by removing \(z'' < z^*\) examples from \({\mathcal {D}}{}\). In addition, \({\mathcal {D}}{}''\) is a solution to Problem (6) as \(\mathsf {unf}{}(h,{\mathcal {D}}{}'') > \epsilon\). This contradicts the fact that \(z^*\) is the optimal objective value of Problem (6). Hence, \(\forall d < d^*, \forall {\mathcal {D}}{}' \in {\mathcal {B}}{}({\mathcal {D}}{}, d), \mathsf {unf}{}(h,{\mathcal {D}}{}') \le \epsilon\).

Finally, by (1) and (2), \(d^* = \mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\). \(\square\)

\(\mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon )\) can be solved using any Mixed-Integer Programming (MIP) solver (more implementation details are provided in Sect. 5.1.2). The main computational challenge resides in the fact that Constraint (8) is non-linear. However, due to the modest size of the model, common solvers are able to solve the problem to optimum within fractions of seconds.

Additionally, based on the principles described in Proposition 5, we have designed a simple greedy algorithm \({\mathcal {G}}reedy\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\) that can be used to approximate \(\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\). Its pseudo-code is depicted in Appendix A.

Intuitively, \({\mathcal {G}}reedy\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\) starts with the entire dataset \({\mathcal {D}}{}\), and successively removes examples to build subsets of \({\mathcal {D}}{}\) until fairness is violated. At each step, it removes exactly one example from the current subset \({\mathcal {D}}{}^c\). This example is chosen to maximize the fairness violation increase. Indeed, this value, as well as the associated example to be removed, can be computed in constant time using \(\gamma (h, {\mathcal {D}}{}^c)\) as defined in Proposition 5. This is due to the fact that, given \({\mathcal {D}}{}^c\), only four possible operations can be considered to modify fairness: remove an example of protected group a (respectively b) that satisfies (respectively does not satisfy) the fairness requirement.

\({\mathcal {G}}reedy\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\) comes with no optimality guarantee and we can easily craft instances on which it does not achieve optimality. However, it provides an upper-bound on \(\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\), and has polynomial \({\mathcal {O}}(|{\mathcal {D}}{} |)\) complexity, where \(|{\mathcal {D}}{} |\) is the number of examples in \({\mathcal {D}}{}\). We show empirically in Sect. 5.1.4 that it can approximate \(\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\) well in practice.

We formally defined our sample-robustness criterion for fairness, as well as an Integer Programming model to precisely quantify it. One of the strengths of the proposed approach is that it can be used to quantify the fairness robustness of any classifier h, given only an access to its predictions. In particular, the approach is agnostic to the hypothesis class of the classifier h, no additional assumptions are necessary and a black-box access to the model is sufficient. In the next subsection, we present the resulting learning problem statement. Afterwards, we discuss the practical issues with this formulation and show how it can be integrated within existing learning algorithms.

3.4 Integration with fair learning algorithms and practical challenges

As discussed previously, the use of the Jaccard distance to define the perturbation sets around \({\mathcal {D}}{}\) has some theoretical advantages. However, carefully calibrating the parameter d to avoid over-constraining the problem is necessary. In addition, the resulting problem may still be hard to solve in practice and/or penalize utility too much.

A possible use would be to solve \(\mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon )\) and directly use the resulting \(\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\) to quantify the fairness sample-robustness of classifier h. Integrating this term directly within the objective function of a learning algorithm might appear to be suitable. However, this would require solving \(\mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon )\) at each model update to be able to audit the fairness sample-robustness. For instance, this would not be trivial for gradient-based learning techniques, as \(\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\) is not a differentiable value. In addition, sample-robustness values found by solving \(\mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon )\) depend on the dataset considered and its structure (in particular, through the influence of the cardinalities of the protected groups). This implies that a particular sample-robustness value may be satisfactory for a given task, but may not be meaningful for another dataset or another pair of protected groups. Furthermore, there is often an important gap between realistic, task-useful models’ sample-robustness and that of any constant classifier (which is 1.0). These observations make the integration of our robustness quantification notion into learning algorithms more difficult. Then, we formulate the sample-robust fair learning problem as a multi-objective problem, using an \(\epsilon\)-constraint method. In other words, considering the fair learning problem (3), we include our fairness sample-robustness term as a constraint:

$$\begin{aligned} \mathop {{\mathrm{arg\,min}}}\limits _{h \in {\mathcal {H}}}& \quad f_{obj}{}(h, {\mathcal {D}}{}) \nonumber \\ \text {s.t.}& \quad {\mathsf {unf}{}(h,{\mathcal {D}}{})} \le \epsilon \nonumber \\& \quad {\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )} \ge \mu \end{aligned}$$
(15)

Note that Problem (15) is indeed equivalent to Problem (5), reformulated to use the fairness sample-robustness quantification notion introduced in Sect. 3.3 (i.e., \(\mathcal {SR}(h,{\mathcal {D}}{},\epsilon )\) and the discussed tools to measure it). In particular, the \(\mu\) parameter of Problem (15) corresponds to the d parameter of Problem (5).

As discussed earlier, an important difficulty with Problem (15) is the calibration of the \(\mu\) parameter. More precisely, as a meaningful value of \(\mu\) depends on the dataset at hand, on the considered sensitive attributes, on the unfairness metric and on the unfairness constraint \(\epsilon\), determining a good value for \(\mu\) is difficult. For this reason, we propose to build a Pareto frontier between utility (\(f_{obj}{}(h, {\mathcal {D}}{})\)) and fairness sample-robustness (\(\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\)), for a fixed value of \(\epsilon\). To realize this, we first solve Problem (15) with no constraint on \({\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )}\) (\(\mu = 0\)). Then, we measure the resulting model’s \(h_0\) sample robustness and solve Problem (15) again, using this value (\(\mathcal {SR}{}(h_0,{\mathcal {D}}{},\epsilon )\)) for the \(\mu\) parameter. We iterate this process until reaching perfect sample-robustness of 1.0 (which can always be reached by building a trivial constant classifier). However, remark that this approach has two drawbacks. First, it is not obvious which solution of the Pareto frontier should be kept. Second, as the process is done sequentially, it may be long to finish and this time is not predictable.

To address the first challenge, once a model’s training is finished, we propose to audit its fairness on a separate validation set. If the validation unfairness meets a given criterion (e.g., is lower than the training unfairness or lower than \(\epsilon\)), we return this model. Otherwise, we strengthen the sample-robustness constraint and iterate. The second difficulty remains (even though the validation step stops the process earlier instead of building the entire Pareto frontier).

Finally, we propose a sample-robustness framework for statistical fairness. After characterizing our perturbation sets structure and the resulting learning problem, we show how it can be integrated within existing algorithms. We further conduct an experimental evaluation of this approach in Sect. 5.1. However, practical difficulties remain, such as an important computational overhead and practical integration challenges (solving a MIP within a learning algorithm). These challenges motivate a heuristic formulation of the problem.

4 A heuristic method to improve fairness sample-robustness

In this section, we propose a heuristic method designed to improve fairness sample-robustness, without exhibiting some of the practical limitations of the exact approach proposed in the previous section. First, we introduce this heuristic method before showing how it can be integrated into two state-of-the-art fair learning algorithms.

4.1 Approximating the perturbation sets

We have showed that an exact application of our proposed formulation is possible, but challenging. Indeed, in practice, a heuristic application of our proposed principle can be beneficial, even if no formal guarantees hold. The approach we propose consists in computing \(n{}\) random subsets of the training set using \(n{}\) random binary masks. Each mask \({\mathcal {M}}_{i}\) is a vector of size \(|{\mathcal {D}}{} |\), in which each coordinate \({\mathcal {M}}_{ij} \in \{0,1\}\) (\(i\in \{1 \ldots n{} \}\) and \(j\in \{1 \ldots |{\mathcal {D}}{} |\}\)) is a random binary value. We denote by \({\mathcal {D}}{}_{i}\) the subset associated with mask \({\mathcal {M}}_i\) as follows: \({\mathcal {D}}_{i} = \{ e_j \in {\mathcal {D}}{} \mid {\mathcal {M}}_{i,j} =1 \}\). This is used in Definition 8 to define the heuristic perturbation set.

Definition 8

(Heuristic perturbation sets) Consider a dataset \({\mathcal {D}}{}\) and a set of \(n{}\) binary masks \(M_1 \ldots M_n{}\) of size \(|{\mathcal {D}}{} |\). The heuristic perturbation set, denoted by \({\mathcal {B}}_\omega {}({\mathcal {D}}{}, n{})\), is defined as: \({\mathcal {B}}_\omega {}({\mathcal {D}}{}, n{}) = \{{\mathcal {D}}{}, {\mathcal {D}}{}_1, {\mathcal {D}}{}_2, \ldots {\mathcal {D}}{}_n \}\).

In a nutshell, instead of considering the entire previously defined perturbation set \({\mathcal {B}}{}({\mathcal {D}}{}, d)\), we only enforce fairness on some randomly generated subsets (belonging to \({\mathcal {B}}{}({\mathcal {D}}{}, d)\) by construction). Intuitively, \({\mathcal {B}}{}({\mathcal {D}}{}, d)\) considers all subsets of \({\mathcal {D}}{}\) whose Jaccard distance from \({\mathcal {D}}{}\) is at most d. In contrast, \({\mathcal {B}}_\omega {}({\mathcal {D}}{}, n{})\) only considers n random subsets of \({\mathcal {D}}{}\) (along with \({\mathcal {D}}{}\) itself). In the graph representation of Fig. 1, our heuristic perturbation sets contain randomly selected vertices.

By replacing \({\mathcal {B}}{}({\mathcal {D}}{}, d)\) by \({\mathcal {B}}_\omega {}({\mathcal {D}}{}, n{})\) in Problem (5), we get the heuristic formulation of our sample-robust fair learning problem. The intuition behind this heuristic approach is that the randomly sampled subsets of \({\mathcal {D}}{}\) have slightly different distributions. Hence, enforcing fairness for such subsets effectively leads to a form of heuristic distributionally robust optimization. It is possible to draw a parallel with the Bagging (Bootstrap AGGregatING) ensemble learning method (Zhou 2012). Indeed, the idea underlying bagging is that training different models using different samplings of the training set may improve robustness by reducing the variance. This happens because such samplings have slightly different distributions, neighbouring the original one. While bagging leverages the different samplings to learn a set of models that will reduce the variance of the accuracy, we use them to enforce fairness in a robust manner.

This heuristic formulation does not have the theoretical appeal of our exact sample-robustness quantification framework, but exhibits considerable practical advantages. Indeed, it does not require calibrating the \(\mu\) parameter of Problem (15), which explains why we no longer need a separate validation set. In addition, computing unfairness over a finite set of subsets defined with masks can be done in linear time with respect to the input size, which is considerably simpler than solving \(\mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon )\). It is also easier to integrate within existing algorithms (and in particular, gradient-based techniques - as we show later).

Compared to the approach of Mandal et al. (2020), our heuristic method for robust fair learning does not come with theoretical guarantees, but its simplicity provides practical advantages in terms of scalability and applicability. First, it can be easily integrated into most fair classification techniques and it does not require access to a cost-sensitive learning algorithm (in cost-sensitive learning, the instances of the training set are associated to weights that define their contribution to the objective function value). Second, unlike Cotter et al. (2018, 2019a), we do not require a prior split of the data. Finally, as we show later in the experiments section, our heuristic method can be efficiently integrated within fair learning algorithms and allows an empirical improvement of the generalization of fairness.

In the next two sections, we show how to include our heuristic method into two state-of-the-art fair classification techniques (solving Problem (3)) that have different characteristics. The first one is an exact branch-and-bound algorithm that builds inherently interpretable models. It works with binary data and binary protected group membership. As interpretability is becoming a key property for machine learning models (Freitas 2014; Rudin 2019), we believe that our method could be applicable in a wide range of contexts.

The second one is based on a two-player game formulation of constrained optimization. It uses gradient-based techniques without necessarily binarizing the data and handles fairness for any number of protected groups. Both methods are metric-agnostic and could be used to enforce any statistical fairness measure.

4.2 Integration with FairCORELS

FairCORELS (Aïvodji et al. 2019, 2021) is an extension of the CORELS (Angelino et al. 2017, 2018) algorithm that builds fair rule lists on binary datasets (attributes and labels) given two protected groups. A rule list (Rivest 1987) is a classification model defined by an ordered list of if-then rules (with a default prediction if non of the rules applies). Given a collection of rules (consisting in any combination of attributes mined as preprocessing), FairCORELS certifiably builds a rule list with the highest objective function among those meeting a given statistical fairness constraint. It is a branch-and-bound algorithm that represents the search space of the rule lists \({\mathcal {R}}{}\) using a prefix tree. In this prefix tree, each node is a rule and each path from the root to a node is a prefix (ordered set of rules), that can be extended with a default decision to form a potential solution. Leveraging a collection of bounds, FairCORELS explores this search space using a given search heuristic and updates the current best solution only when the candidate model has an unfairness value at most equal to a given \(\epsilon\). Let \(\mathsf {misc}{}(.)\) be the misclassification error, \(\mathsf {unf}{}(.)\) the unfairness oracle, \(K_r\) the length of prefix of rule list r and \(\lambda\) a regularization parameter penalizing longer rule lists, FairCORELS solves the following constrained optimization problem:

$$\begin{aligned} \mathop {{\mathrm{arg\,min}}}\limits _{r \in {\mathcal {R}}}&\quad f_{obj}{}_{{{{\texttt {FairCORELS}}}}} = \mathsf {misc}{}(r, {\mathcal {D}}{}) + \lambda . K_r \nonumber \\ \text {s.t.}&\quad\mathsf {unf}{}(r,{\mathcal {D}}{}) \le \epsilon \end{aligned}$$

Integrating the proposed heuristic perturbation sets into the FairCORELS algorithm is quite simple. Whenever an evaluated rule list improves over the current best objective function, it is accepted only if it has an unfairness value lower than \(\epsilon\) on the training set and on each of its subsets defined by the \(n{}\) masks. Finally, the modified algorithm searches for the rule list solution to the following problem:

$$\begin{aligned} \mathop {{\mathrm{arg\,min}}}\limits _{r \in {\mathcal {R}}}&\quad f_{obj}{}_{{{{\texttt {FairCORELS}}}}} = \mathsf {misc}{}(r, {\mathcal {D}}{}) + \lambda . K_r \nonumber \\ \text {s.t.}& \quad \underset{{\mathcal {D}}{}' \in {\mathcal {B}}_\omega {}({\mathcal {D}}{}, n{})}{\max } {\mathsf {unf}{}(h,{\mathcal {D}}{}')} \le \epsilon \end{aligned}$$

In practice, it is often not necessary to compute \(\mathsf {unf}{}(.)\) for each subset. We only compute these quantities if the candidate prefix improves over the current best one and meets the fairness constraint on the training set. In this case, subsets unfairness are computed sequentially and stopped early if the constraint is violated on any of the subsets. This efficient implementation leads to no significant computational overhead compared to the original FairCORELS.

4.3 Integration with TFCO

TensorFlow Constrained OptimizationFootnote 3 (TFCO) is a Python library for optimizing inequity-constrained problems in TensorFlow to produce machine learning models (not restricted to the fair learning problem). It implements the method of Cotter et al. (2019b), formulating the constrained optimization problem as a two-player game. Considering the Lagrangian relaxation of the problem, the first player (\(\theta\)-player) optimizes a model’s parameters to minimize the objective function while the second player (\(\lambda\)-player) updates the Lagrangian-multipliers to approximate the strongest Lagrangian relaxation. While original fairness constraints are non-differentiable proportions (linear combinations of indicators), TFCO allows for the computation of objective and proxy constraints as hinge upper bounds of the real quantities, which allows for the use of gradient-based techniques. In this setting, Cotter et al. (2019b) proposes the Proxy Lagrangian framework. The latter reduces the constrained optimization problem to a two-player non-zero sum game, in which the “learner” optimizes the model’s parameters to minimize objective function including proxy constraints while the “auditor" updates the Lagrangian multipliers based on the true constraints’ violations.

In the general context of constrained optimization, the proxy Lagrangians associated to the two players optimizing objective function \(g_0(\theta )\) under m constraints \(g_{i, i \in [m]}\) are:

$$\begin{aligned} {\mathcal {L}}_\theta (\theta ,\lambda )&= \lambda _1g_0(\theta )+\sum _{i=1}^{m}{\lambda _{i+1}{\tilde{g}}_i(\theta )}\nonumber \\ {\mathcal {L}}_\lambda (\theta ,\lambda )&= \sum _{i=1}^{m}{\lambda _{i+1}g_i(\theta )} \end{aligned}$$
(16)

in which \(\lambda _j\) are the Lagrange multipliers, \(g_i\) measures violation of constraint i and \({\tilde{g}}_i\) is its differentiable proxy.

Integrating our heuristic perturbation sets into the TFCO framework does not require modifying the library. Indeed, it simply consists in including additional constraints to the declared optimization problem, to enforce the fairness constraints on the subsets of the training set defined by the \(n{}\) masks. Formally, we add one constraint per protected group per mask. When dealing with m protected groups, the original fair formulation includes m constraints (bounding a statistical measure’s difference between each protected group and the overall training set). Our heuristic sample-robust method declares \(m.(n{}+1)\) fairness constraints (enforcing the m fairness constraints on the \(n{}+1\) sets of \({\mathcal {B}}_\omega {}({\mathcal {D}}{}, n{})\)) that will be included in the objective function and weighted with Lagrange-multipliers, following the approach of Cotter et al. (2019b), as in Eq. (16). Finally, integrating our proposed heuristic approach within TFCO is quite straightforward. One may observe that it would not be the case for the exact method proposed in Sect. 3, as the output of \(\mathcal {IPSR}{}\) is not a differentiable value.

5 Experiments

We now empirically evaluate the proposed sample-robustness approaches for statistical fairness, over a variety of datasets, sensitive attributes, and statistical fairness metrics, and two fair learning algorithms of the literature.

In a first subsection, we compare the exact and heuristic formulations, both integrated within the FairCORELS algorithm. Afterwards in a second subsection, we demonstrate the effectiveness of our heuristic formulation within TFCO and compare it to a state-of-the art technique for improving statistical fairness generalization (Cotter et al. 2018, 2019a).

5.1 Integration into FairCORELS and comparison between exact and heuristic methods

In this section, we integrate and evaluate our exact and heuristic methods within the FairCORELS algorithm. We first introduce the considered setup and define the different methods implemented. Then, we show that the exact formulation effectively improves unfairness generalization through the performed iterations. In the fourth and fifth subsections, we show that our heuristic method improves fairness sample-robustness and statistical fairness generalization. Finally, we compare the exact and heuristic approaches in terms of fairness sample-robustness and learning quality (trade-offs between accuracy and fairness at test time).

5.1.1 Setup

Our experiments cover four binarized datasets widely used in the fair learning literature. For each dataset, the mined rules are single- and two-clause antecedents (i.e., conjunctions of at most two attributes or their negation).

  • Adult Income datasetFootnote 4 (Frank and Asuncion 2010). This dataset gathers records of more than 45, 000 individuals from the 1994 U.S. census. We consider the binary classification task of predicting whether an individual earns more than 50, 000$ per year. We use the same preprocessing as Aïvodji et al. (2019, 2021), considering gender to be the binary sensitive attribute (Male or Female).

  • COMPAS datasetFootnote 5(analyzed by Angwin et al. (2016)). We consider the same discretized dataset used to evaluate CORELS in Angelino et al. (2017). The associated label/decision is whether the person will re-offend (recidivate) within 2 years (yes or no) while the binary sensitive attribute is race (African-American or Caucasian). Rule mining is done similarly to Aïvodji et al. (2019, 2021).

  • Defaut of Credit Card Clients datasetFootnote 6 (Yeh and hui Lien 2009). The dataset is discretized using quantiles. The discrete attributes were used to generate single and two clause rules, and only rules with support higher than 0.5 were kept. The associated decision is whether the person will default in payment (the next time they use their credit card), with the sensitive attribute being gender (Male or Female). The resulting dataset contains 189 rules and 29,986 examples.

  • Bank Marketing datasetFootnote 7 (Moro et al. 2014). The dataset is discretized using quantiles. The discrete attributes are used to generate single and two clause rules and only rules with support higher than 0.5 were kept. The associated decision is whether the person will subscribe to a term deposit. The resulting dataset contains 41,175 examples and 179 rules, among which the 2 protected attributes : age:30-60 and not_age:30-60.

For each dataset, we prevent the use of the sensitive attributes in the model built to avoid disparate treatment. For all experiments, we set the maximum number of nodes in FairCORELS’ prefix tree to \(2.5 \times 10^6\) along with some fixed parameters such as the branching heuristic after a preprocessing step.

5.1.2 Methods

Exact method We modified \({{{\texttt {FairCORELS}}}}{}\) to solve Problem (15)Footnote 8. Solving \(\mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon )\) is costly (even though it can be done in fractions of seconds in practice) and we should avoid doing it at each iteration of the learning algorithm. Within \({{{\texttt {FairCORELS}}}}{}\), \(\mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon )\) is solved only when the current best solution update subroutine is called (just like mask-related constraints are verified in the integration of our heuristic approach). In other words, we only audit a model’s sample-robustness when it is about to become the new current best solution. This guarantees that the final solution meets the desired fairness sample-robustness constraint, while in practice performing a small number of calls to the solver. For our experiments, \(\mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon )\) is solved using the OR-Tools CP-SAT solver (Perron and Furnon 2019). Implementation of \(\mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon )\) necessitates some reformulation, in particular regarding Constraint (8), which is non-linear and requires products computation. Indeed, we can get rid of the divisions by multiplying both sides of the inequality by the (positive) product of the denominator variables, and further using intermediate variables. We consider the four fairness metrics presented in Table 1 and an unfairness tolerance \(\epsilon \in \{0.02, 0.015, 0.01, 0.005\}\). Results for the different values of \(\epsilon\) show similar trends, hence we only report those for \(\epsilon =0.01\) for conciseness reasons. We compare five variants based on our exact method:

  • We solve Problem (15) iteratively and update the fairness sample-robustness constraint \(\mu\) at each step, without validation set, until \(\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon ) = 1\). More precisely, we build the entire Pareto frontier between accuracy and fairness sample-robustness, for a fixed value of \(\epsilon\). We denote this set of models the sample robust fair frontier (no validation). While selecting a particular model within the built sequence remains an open problem, this enables to visualize the different trade-offs that are obtained during the iterations. Among these built models, we select the non-constant one with higher fairness sample robustness. We call this the no validation (before-constant) method.

  • We solve Problem (15) iteratively with a validation set. We then obtain a Pareto frontier between train accuracy and fairness sample-robustness. This set of models is the sample robust fair frontier (validation). We leverage the validation set to define two stopping criteria. The first one, called validation (\(\epsilon\) criterion), stops iterating when the validation unfairness is under \(\epsilon\) (i.e., when the fairness constraint enforced on the training set is also met on the validation set). The second one is called validation (train unf. criterion). It stops iterating when the validation unfairness is smaller or equal to the training one.

Heuristic method The integration of our heuristic approach within FairCORELS is depicted in Sect. 4.2.Footnote 9 For a number of masks \(n{} \in \{0, 10, 30\}\), we compute the training Pareto frontier (between accuracy and fairness) of FairCORELS, with a fixed list of 147 values for the unfairness tolerance \(\epsilon\) (ranging non-linearly with a higher density for higher fairness constraints). By evaluating each model obtained on its test set, we obtain approximations of the Pareto frontier of FairCORELS in test. We compute such frontier for the four different statistical notions of fairness presented in Table 1. Based on this setup, we assess FairCORELS’ fairness generalization ability, with (\(n{} \in \{10, 30\}\)) or without (\(n{} = 0\)) our proposed approach.

All reported values are averaged using 5-folds cross-validation, with all methods are trained and evaluated on the same data splits. For the methods sample robust fair frontier (validation), validation (\(\epsilon\) criterion) and validation (train unf. criterion), part of the original training set is used for validation (and not used for training the model).

5.1.3 Exact method: effects of fairness sample-robustness on fairness generalization and accuracy

In this subsection, we show that our exact method effectively improves the statistical fairness generalization. We also visualize the resulting trade-offs between accuracy and fairness sample-robustness.

Fig. 3
figure 3

Training, test and validation (when applicable) unfairness of models generated by \({{{\texttt {FairCORELS}}}}\) through the iterations of our exact method (Statistical Parity metric, \(\epsilon = 0.01\))

Fig. 4
figure 4

Fairness sample-robustness and accuracy of models generated by \({{{\texttt {FairCORELS}}}}\) through the iterations of our exact method (Statistical Parity metric, \(\epsilon = 0.01\))

Figures 3 and 4 illustrate the obtained results on the four datasets, for the Statistical Parity fairness metric (\(\epsilon =0.01\)). Results for the three other metrics (\(\epsilon =0.01\)), and for the four datasets, are given in Appendices C.1 and C.2. Detailed results for the Statistical Parity metric (including standard deviation) are included in Appendix C.3. In both figures, the sample-robust fair frontier (with or without validation set) presents all obtained models, sorted by their index in the sequence. This means that the leftmost point corresponds to the unconstrained model, while the rightmost point exhibits 1.0 fairness sample-robustness and is obtained after the last iteration of the process. Note that this last model is usually a constant classifier, as discussed in Sect. 3.2.4. However, this is not always the case, and for the Predictive Equality and Equal Opportunity metrics, some models achieve perfect sample-robustness without reaching trivial accuracy. This is possible if the built rule lists only make mistakes on the positively (respectively negatively) labelled instances. This is the case, for example, if the model’s prefix rules only capture positively (respectively negatively) labelled examples, while a default decision classifies negatively (respectively positively) the uncaptured ones.

Intuitively, we strengthen the fairness sample-robustness constraint of Problem (15) through the iterations of the method (numbered in the x-axis). This increases the fairness sample-robustness sequentially, as shown in Fig. 4. This effectively lowers the test unfairness as expected (Fig. 3), but degrades the training and test accuracy (Fig. 4). This suggests that a trade-off between accuracy and fairness sample-robustness exists. In Fig. 3, we see that considering the non-constant classifier with higher fairness sample-robustness (no validation (before-constant) method) satisfies the fairness requirement at test time, which was precisely the aim of our method. However, we see in Fig. 4 that this fairness generalization improvement comes at a high cost in terms of accuracy (both at training and test times).

While using a separate validation set, we see that we usually require fewer iterations and get models closer to the unfairness tolerance \(\epsilon\). Such models do not always meet the fairness constraint at test time, but still lead to a reduction of the test unfairness.

One may observe that the points associated to no validation (before-constant), validation (\(\epsilon\) criterion) and validation (train unf. criterion) do not lie on their associated curves. The reason for this is that the stopping criteria are applied separately on each fold (with the fold’s validation set). Hence, the models obtained with these methods are learnt with different number of steps on different folds. The x-positioning of the points is then performed based on the average sample-robustness value within the corresponding frontier. Finally, we see that when the fairness constraint is already satisfied (as for the Bank Marketing dataset), the method still allows a reduction of the test unfairness, which may strengthen unfairness robustness.

The evaluation of our exact methods’ test accuracy as well as comparison with our heuristic method is performed later in Sect. 5.1.6. In the next two subsections, we evaluate our heuristic method.

5.1.4 Heuristic method: fairness sample-robustness improvement results

In this part of our experiments, we use our heuristic sample-robust method within \({{{\texttt {FairCORELS}}}}{}\) and empirically show that it effectively improves the fairness sample-robustness. As stated previously, we compare three variants: the original FairCORELS (no mask) as well as FairCORELS modified with our heuristic sample-robust method for 10 masks and 30 masks.

We use \(\mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon )\) and \({\mathcal {G}}reedy\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\) to audit a posteriori the fairness sample-robustness of the models built with each of the three methods. As discussed earlier, \({\mathcal {G}}reedy\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\) provides no optimality guarantee, but it has polynomial complexity and can be used to upper-bound \(\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\). On the other side, \(\mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon )\) computes the exact value of \(\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\) but is computationally more expensive.

Fig. 5
figure 5

Fairness sample-robustness of models generated by \({{{\texttt {FairCORELS}}}}\) using our heuristic method (Default of Credit Card Clients dataset)

To empirically demonstrate that our heuristic approach is suitable to improve sample-robustness, we report in Fig. 5 results for the four metrics (Statistical Parity, Predictive Equality, Equal Opportunity and Equalized Odds), for the Defaut of Credit Card Clients dataset. Results for the remaining datasets are given in Appendix C.4. When the fairness constraint enforced is strengthened (\(1-\epsilon\) grows), fairness sample-robustness decreases. We explain this by the fact that, as the fairness constraint becomes tighter, it is met on a lower number of subsets of the training set. In particular, the radius of the ball (measured using the Jaccard distance) around the training dataset in which the fairness constraint is met everywhere (\(\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\)) diminishes. However, we observe in Fig. 5 that our heuristic method is able to mitigate the decrease of the fairness sample-robustness. Additionally, there seems to be a correlation between the number of masks used and the resulting fairness sample-robustness. We note that \({\mathcal {G}}reedy\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\) actually performs very well and proposes a close over-approximation of \(\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\). For these experiments, \({\mathcal {G}}reedy\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\) found the optimal solution (i.e., the value returned by \(\mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon )\)) \(88\%\) of the time. For the remaining \(12\%\), the average gap to optimality is less than \(5\%\). Considering the four datasets and the four fairness metrics, \({\mathcal {G}}reedy\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\) found the optimal solutions for \(78\%\) of the executions. For the remaining \(22\%\), the average gap to optimality is around \(36\%\). This means that, for some executions, the value found by \({\mathcal {G}}reedy\mathcal {SR}{}(h,{\mathcal {D}}{},\epsilon )\) is considerably higher than the exact value of \(\mathcal {IPSR}{}(h,{\mathcal {D}}{},\epsilon )\), resulting in a high upper-bound.

We have showed that our heuristic method empirically improves fairness sample-robustness. In the next subsection, we show that as a result it also improves the fairness generalization, hence allowing the construction of models with better test accuracy/fairness trade-offs in regimes of low unfairness.

5.1.5 Heuristic method: statistical fairness generalization improvement results

In this part of our experiments, we use our heuristic sample-robust method within \({{{\texttt {FairCORELS}}}}{}\) and empirically show that it effectively enhances the statistical fairness generalization.

Fig. 6
figure 6

Results of our heuristic method (Adult Income dataset, Equal Opportunity metric)

Fig. 7
figure 7

Results of our heuristic method (COMPAS dataset, Equal Opportunity metric)

Fig. 8
figure 8

Results of our heuristic method (Default Credit dataset, Equal Opportunity metric)

Fig. 9
figure 9

Results of our heuristic method (Marketing dataset, Equal Opportunity metric)

Figures 6, 7, 8 and 9 present respectively the performances of the three variants on the Adult Income, COMPAS, Defaut of Credit Card Clients and Bank Marketing datasets, for the Equal Opportunity metric (which is widely used in the literature). Results for the four datasets using all fairness metrics, which display similar trends, are included in Appendix C.5. Each figure has three graphs, with each point of a graph corresponding to a solution (averaged with the 5-folds cross validation). Note that we focus on solutions whose unfairness is at most 0.05, because this part of the trade-offs (medium to strong fairness constraints) is the most interesting one in our experiments. It allows us to investigate unfairness generalization under meaningful constraints. To ease train/test comparison, the ranges of both axis of the first two plots are set identically.

The first graph is a Pareto frontier built on the training set, which displays the set of non-dominated solutions (in terms of unfairness and error) on the training set. Solutions closer to the lower left corner are preferable, as they correspond to lower error and lower unfairness. We observe that, in all cases, our heuristic method leads to a lower trade-off on the training set. This suggests that robustness comes at the cost of a lower training performance, which is not really problematic as we shall prefer solutions exhibiting worse accuracy/fairness trade-offs at training time but generalizing better.

The second graph is the Pareto frontier built on the test set, which illustrates the effectiveness of our proposed approach. In all cases, using our heuristic method (with either 10 or 30 masks) leads to a denser Pareto frontier, exhibiting better accuracy/fairness tradeoffs. This is particularly the case for tight fairness constraints, in which the standard FairCORELS fails to generate solutions exhibiting such low test unfairness.

The third graph illustrates the generalization of unfairness, presenting the test unfairness as a function of the training one. The ideal generalization scenario, in which the test unfairness is exactly equal to the training unfairness, is represented by the diagonal line. We observe that for tight fairness constraints (left side of the figure), solutions generated by the original FairCORELS generalize badly: they exhibit low training unfairness but considerably higher test unfairness. In contrast, solutions generated using our approach generalize considerably better as the corresponding points are closer to the ideal scenario.

Finally, all the presented results show that the use of our heuristic method leads to the generation of models trading some training performances for fairness robustness, and presenting better accuracy/fairness trade-offs on unseen data (at test time). In particular, models learnt using our heuristic method have a considerably smaller unfairness generalization error, which allows for populating areas of the test Pareto frontier that the original FairCORELS failed to fill in.

5.1.6 Comparing the exact and heuristic approaches

We now compare the proposed exact and heuristic approaches on similar problems.

Fig. 10
figure 10

Fairness sample-robustness of models generated by \({{{\texttt {FairCORELS}}}}\) using our exact and heuristic sample-robust fair methods (Statistical Parity metric, \(\epsilon = 0.01\))

Fig. 11
figure 11

Test error and unfairness of models generated by \({{{\texttt {FairCORELS}}}}\) using our exact and heuristic sample-robust fair methods (Statistical Parity metric, \(\epsilon = 0.01\))

Figure 10 displays the built models’ sample-robustness, for the Statistical Parity metric (\(\epsilon =0.01\)). Results for the three other metrics (\(\epsilon =0.01\)) and for the four datasets, are given in Appendix C.6. We observe that both the heuristic and exact methods are able to improve fairness sample-robustness over the original FairCORELS. Overall, as expected the exact method can lead to higher sample-robustness values as the models are learnt with constraints over this precise value.

Figure 11 shows the obtained test error/test unfairness trade-offs for the Statistical Parity metric (\(\epsilon =0.01\)). Results for the three other metrics (\(\epsilon =0.01\)), for the four datasets, are given in Appendix C.7. We observe that all the proposed methods usually diminish fairness violation at test time (the associated points are either under \(\epsilon\) or closer to it). This improvement on fairness generalization induces a cost on the model’s error. As a general trend, we see that the greater the fairness generalization improvement, the greater the error incurred. However, the generated solutions often propose interesting trade-offs between error and unfairness. In particular, in the Bank Marketing experiment the robustness enforced for fairness can sometimes benefit the error generalization as well.

Finally, we showed that the integration of our exact or heuristic methods within FairCORELS practically improve fairness generalization. While both approaches successfully enforce fairness robustness, the exact method is computationally more expensive, because it consists in a sequence of trainings (while the heuristic method only trains once). In addition, each training is more expensive as it requires solving an integer programming model. Overall, the heuristic approach seems more appealing for practical applications, thanks to its flexibility, computational efficiency and empirical effectiveness.

5.2 Heuristic method: integration into TFCO and comparison to a state-of-the-art method

The objective of this section is two-fold. First, we show that the use of our heuristic sample-robust method improves fairness generalization over the standard fair learning formulation using TFCO. Second, we compare these results with a state-of-the-art method improving fairness generalization (Cotter et al. 2018, 2019a). This approach possesses scalability and applicability properties similar to ours. It is also implemented using TensorFlow Constrained Optimization, which enables a direct comparison with our heuristic method.

5.2.1 Setup

For these experiments, we build on the setup of Cotter et al. (2019a) and compare the following approaches:

  • unconstrained trains a model without enforcing fairness constraints (hence only minimizing training error).

  • baseline is the fair learning approach based on the implementation of Cotter et al. (2019b) for non-convex constrained optimization (minimizing training error subject to fairness constraints).

  • validation is the approach described in Cotter et al. (2018, 2019a), which is proposed to improve fairness generalization over the baseline approach. In a nutshell, to avoid constraints overfitting, the training set is split between two distinct sets: train and validation. Then, the Lagrangians of Eq. (16) are computed on these two different sets. On the one side, the \(\theta\)-player optimizes the model parameters over train. On the other side, the \(\lambda\)-player measures fairness constraints violations on validation.

  • dromasks-n is the integration of our method into baseline, using \(n{}\) masks (in practice, we use n \(\in \{10, 30, 50\}\)).

All methods are implemented using the TensorFlow Constrained Optimization library (Cotter et al. 2019b). Similar to Cotter et al. (2019a), we evaluate all methods using two formulations: the Proxy Lagrangian described in Eq. (16) (Algorithm 2 of Cotter et al. (2019b)) and the usual Lagrangian (Algorithm 3 of Cotter et al. (2019b)).

For baseline, validation and dromasks-n, the result of the training is a sequence of iterations. Following Cotter et al. (2019a), we use their “shrinking” procedure to find the best stochastic classifier supported on the sequence of iterates. We average all the results obtained over 100 runs. For unconstrained and baseline, the runs differ by the random seed used to generate the mini-batches. For validation, the runs are different due to the seed used to generate the training/validation split, and for \({{\texttt {dromasks-}}}n\), they differ by the seed used to generate the random binary masks. All methods see exactly the same training data at each run and are evaluated over the same testing set.

We extend the setup of Cotter et al. (2019a), considering four experiments using different datasets and notions of fairnessFootnote 10. We train the neural network models with one hidden layer containing 50 ReLU neurons. All models are trained using minibatches of 100 instances. For each experiment, we measure the training and testing errors and maximum fairness constraint violations. The latter increases as the reported values increase (values \(\le 0\) correspond to no fairness violation). For the validation method, training error is computed on the training subset while training constraint violation is computed on the validation subset similarly to Cotter et al. (2019a).

Experimentation 1 uses the UCI Adult datasetFootnote 11 (Frank and Asuncion 2010), preprocessed to include only binary attributes, with the classification task being to predict whether a person’s yearly income is greater than $50,000, subject to the 80% rule for Statistical Parity. This means that for each of four overlapping protected classes (Black, White, Female and Male), the Positive Prediction Rate must be at least 80% of the overall Positive Prediction Rate. We use the designated training/testing split, with 32, 561 instances for training and 16, 281 for testing. The dataset has 122 attributes with the model being trained for 40 epochs.

Experimentation 2 uses the ProPublica COMPAS datasetFootnote 12 (analyzed by Angwin et al. (2016)), preprocessed to include only binary attributes, with the classification task being to predict recidivism, subject to Equal Opportunity fairness constraints. This means that for each of four overlapping protected classes (Black, White, Female and Male), the Positive Prediction Rate on the positively-labeled examples (True Positive Rate) must be at most \(5\%\) higher than the overall Positive Prediction Rate on positively-labeled examples. We use a designated training/testing split, with 4, 134 instances for training and 2, 038 for testing. The dataset has 32 attributes with all models being trained for 100 epochs.

Experimentation 3 uses the UCI Bank Marketing datasetFootnote 13 (Moro et al. 2014), with the classification task being to predict subscription to a term deposit, subject to Predictive Equality fairness constraints. We use a version of the dataset with 16 attributes, among which 10 are categorical and 6 are numerical. We form four protected groups based on the quartiles of the real-valued age attribute and constrain each group’s false positive rate to be no larger than that of the full dataset. We use a designated training/testing split, with 30, 292 instances for training and 14, 919 for testing. While the 6 real attributes are left unchanged (i.e., they are not discretized, as both TFCO and our proposed framework can handle them directly), we one-hot encode categorical ones. We also apply a standard preprocessing (centering and scaling) to all features. The resulting dataset contains 51 attributes. All models are trained for 200 epochs.

Experimentation 4 uses the Defaut of Credit Card Clients datasetFootnote 14 (Yeh and hui Lien 2009), which contains 23 numerical attributes. Among them, there are 9 integer attributes representing categories, and 14 real attributes. We do not discretize them, as TFCO is able to handle continuous features, and our proposed framework imposes no limitation on the attributes’ nature. Here also, we apply a standard preprocessing (centering and scaling) to all features. There are 30, 000 examples in the dataset. We generate 100 random splits, using two thirds of the dataset for training, and one third for testing. We form four overlapping protected groups, based on the values of the attributes gender (Male or Female) and age (Young or Old, based on the median value). The classification task is to predict whether a client will default in payment. However, the dataset is highly unbalanced, with about \(78\%\) negative examples. Hence, machine learning models can reach a high predictive accuracy without accurately detecting positive examples. For this reason, we enforce that the True Positive rates among each protected group is at least \(50\%\). This may result in slightly increasing the overall error (because we might increase the False Positive rates), but detecting more positive examples. Remark that such constraints do not follow the traditional statistical fairness formulation but nevertheless, we show that our approach is still able to improve their generalization. The models are trained for 100 epochs.

Table 2 Error rates and maximum fairness constraints violations for all compared methods, for our four experiments (all values are averaged over 100 runs as described in the setup). Best test results are shown in bold, second best in italics

5.2.2 Results

Table 2 summarizes the results obtained and shows that our method effectively improves fairness generalization while not penalizing accuracy significantly. Overall, the method is competitive to the state-of-the-art validation method without requiring prior split of the data. Results of Experimentation 1 on Adult Income demonstrate that the fairness constraints violations on the test set are the smallest using our method. In addition, only the dromasks-n techniques are able to meet the fairness constraints on the test set. Furthermore, increasing the number of masks seems to improve the fairness generalization while penalizing accuracy, which suggest a fairness robustness / accuracy trade-off. While the validation method also proposes an important reduction of the test fairness violation, dromasks-n gives more interesting results on these experiments while less conflicting with accuracy (which was expected as in the validation approach, each player only sees half of the data during training). Results for Experimentation 2 (on the COMPAS dataset) suggest that in some situations the fact that our approach does not use a separate validation set (but subsets of the same training data) can limit its generalization improvement abilities. However, compared to validation, it has a considerably smaller impact on accuracy, and the resulting trade-offs appear competitive overall. Additionally, we observe that enforcing fairness constraints in a robust manner can improve error generalization due to the metric used (i.e., Equal Opportunity) being aligned with accuracy. Hence, ensuring fairness robustness may also benefit to accuracy. This is also observed is the third experiment on the Bank Marketing dataset, in which some fair models are also more accurate at testing time. Overall, the Lagrangian algorithm appears more suitable for our method. Indeed, we observe that enforcing our proxy constraints (as done with the Proxy Lagrangian algorithm) may not always lead to significant generalization improvements using our masks method. This is particularly clear in the fourth experiment using the Default of Credit Card Clients dataset.

Overall, our method, combined with the Lagrangian algorithm, leads to the most important constraints violation generalization improvements, while having limited impact on accuracy.

6 Conclusion

We proposed a novel formulation of robustness for fair learning aimed at enhancing the statistical fairness generalization in machine learning. Our framework is metric-agnostic and based on the idea that one wants to learn a model whose fairness is verified, even if the training dataset sampling is somehow different. Our formulation is designed to be widely applicable, as many real-world machine learning applications consider finite training sets. In addition, the proposed method can be used both to audit any classifier’s fairness robustness without any knowledge of the classifier’s structure but also for robust fair learning, although it has some practical limitations. To deal with this issue, we proposed an effective and efficient heuristic method, exhibiting practical advantages while still improving fairness sample-robustness and fairness generalization.

A limitation of our framework is that it considers only subsets of the training set (and not all possible sample sets within a given Jaccard distance). This prevents the creation of unrealistic sample sets, which could result in over-constraining the problem. It also gives an interesting structure to our perturbation sets, allowing the derivation of several theoretical properties. Additionally, it leads to an important computational advantage. Indeed, fairness sample robustness audit can be performed solving an integer programming model whose objective function is linear in the decision variables. However, in a more general formulation of sample robustness, this would not be the case, as the denominator of the Jaccard distance would no longer be a constant. Formulating and solving this problem efficiently is a promising direction as well as studying the theoretical and empirical privacy implications of our sample-robustness formulation for fairness.

Finally, automatically determining the best parameters for our heuristic method (i.e., distribution and number of the binary masks, and cardinalities of the defined subsets) is also a research avenue that we want to pursue in the future.