Keywords

1 Introduction

With the ever-growing complexity of machine learning models and their large diffusion, understanding models’ decisions and behavior became a necessity. Therefore, explainable artificial intelligence (XAI), the field that aims to understand and clarify models, flourished with a huge diversity of methods. Several taxonomies have been proposed to differentiate between methods, with common components identified [2, 4, 50]: i) Local vs global: Local methods explain specific model decisions (in this case, the model’s input is called the studied sample or query), while global methods provide insight into overall model behavior. ii) Post-hoc vs intrinsic vs explainable by-design: Post-hoc methods are applied to trained models, while by-design methods produce inherently explainable models. Intrinsic methods take into account model training without affecting the final state. iii) Black-box vs white-box: White-box methods require access to model weights/gradients. iv) Explanation formats which include: attribution methods [33, 104], concepts [35, 66], surrogate models [67, 96], rule-based explanations [114], natural language explanations [19], dependencies [40, 49], and example-based explanations [57, 113].

Fig. 1.
figure 1

Natural example-based explanation formats with respect to the query and the decision boundary. We can see similar examples are the closest elements to the query, while counterfactuals and semi-factuals are on either side of the point of the decision boundary the closest to the query. Prototypes are representative of each class in a dense zone of the dataset and the influential instance bends the decision boundary.

Nonetheless, no matter the taxonomy of a method, its explanations are aimed at humans, hence, they should exploit the vast literature in philosophy, psychology, and cognitive science on how humans generate, understand, and react to explanations [79]. The psychology literature argued that, in everyday life, humans use examples as references to understand, explain something, or demonstrate their arguments [17, 32, 38, 79, 100]. Afterward, through user studies in the XAI field [35, 51, 61], researchers validated that example-based explainability provides better explanations over several other formats where example-based XAI corresponds to a family of methods where explanations are represented by or communicated through samples, or part of samples like crops.

However, previous surveying works on example-based XAI are either cursory as they survey XAI in general [2, 4] or focus on a specific subset such as factual methods [26, 28, 102] or contrastive explanations [59, 84, 113]. In fact, example-based explainability can be divided into several sub-formats with many similarities. As such, covering them together allows conclusions from sub-fields of the literature to serve one another. Thus, we believe a single work thoroughly mapping, describing, and analyzing each example-based XAI sub-format will benefit the field. Besides, this survey will only cover natural example-based explainability methods – i.e methods where examples are training samples and are not generated. Indeed, to generate high-dimensional data points, methods essentially rely on deep neural networks [6, 62]. Nevertheless, for most high dimensional data, such approaches fail to ensure that generated examples are plausible and belong to the manifold (subspace of the input space where samples follow the data distribution), and examples need to be realistic for humans to interpret them [18]. Therefore, natural examples have two advantages, they do not use a model to explain another model which eases their acceptance, and natural examples are plausible by definition. In addition, apart from formats with only generative methods (such as feature visualizations [91]), we do not set aside any formats of example-based XAI as they may all bring new perspectives to others. Lastly, to navigate through the different formats we use the semantic definition of each format as it highlights the differences between formats. In some cases, examples from different formats may be the same sample, hence, clear semantic definitions are necessary to interpret examples.

Explanations in example-based explainability are all data points but there exist different semantic meanings to a given example. Depending on the relation between the example, the query, and the model, the information provided by the example will differ. The semantic definition of an example and the kind of insight it provides divide the example-based format into sub-groups, which are presented in Fig. 1. This overview is organized around those sub-groups (also called formats), this work will unfold as follows:

The first format is similar examples (or factuals) (Sect. 2), for the model, they are the closest elements to the query. Factuals give confidence in the prediction or explain misclassification, but they are limited to the close range of the considered sample. To provide insight into the model behavior on a larger zone around the query, counterfactuals and semi-factuals (Sects. 3.1 and 3.2) are more adapted. They are respectively the closest and the farthest samples on which the model makes a different and similar prediction. They are mainly used in classification, give insight into the decision boundary, and are complementary if paired. While they give an idea of the limit, they do not provide insights on how one could bend the decision boundaries of the model by altering the training data. This is addressed through influential instances (Sect. 4), the training samples with the highest impact on the model’s state. In addition, contrary to previously listed example-based formats, influential instances are not limited to local explanations. Indeed, one can extract the most influential instances for the model in general. Another global explanation format is Prototypes (Sect. 5), which are a set of samples representative of either the dataset or a class. Most of the time they are selected without relying on the model and give an overview of the dataset, but some models are designed through prototypes, thus explainable by design. Concepts (Sect. 6), a closely-related format, is also investigated. A concept is the abstraction of the common elements between samples – e.g. for trees, the concepts could be trunk, branch, and leaf. To communicate such concepts, if they are not labeled, the easiest way is through examples of such concepts (often part of samples such as patches).

Thus we could summarize the contributions of this paper as follows: i) To the best of our knowledge, we are the first to compile natural example-based explainability literature in a survey. Previous works either covered the whole XAI literature with a superficial analysis of example-based XAI or focused on a given sub-format of example-based XAI. ii) For each format we provide simple definitions, semantic meaning, key methods, their comparison, their pros and cons, and examples, and pros and cons. We additionally ground formats into social sciences and depict their cognitive added values when possible. iii) We explore, classify, and describe available methods in each natural example-based XAI format. We highlight common points and divergences for the reader to understand each method easily, with a focus on key methods (see Table 1)

1.1 Notations

Throughout the paper, methods will explain a machine learning model \(h: \mathcal {X} \rightarrow \mathcal {Y}\), with \(\mathcal {X}\) and \(\mathcal {Y}\) being respectively the input and output domain. Especially, this model is parameterized by the weights \(\theta \in \varTheta \subseteq \mathbb {R}^d\). If not specified otherwise, h is trained on a training dataset \(\mathcal {D}_{train} \subset (\mathcal {X} \times \mathcal {Y})\) of size n with the help of a loss function \(l: (\mathcal {X}, \mathcal {Y}, \varTheta ) \rightarrow \mathbb {R}\). We denote a sample by the tuple \(z = (x,y) | \quad x \in \mathcal {X}, y \in \mathcal {Y}\). When an index subscript as i or j is added, e.g. \(z_i\), it is assumed that \(z_i\) belongs to the training dataset. If the subscript “test” is added, \(z_{test}\), the sample does not belong to the training data. When there is no subscript, the sample can either be or not in the training data. Finally, the empirical risk function is denoted as \(\mathcal {L(\theta )} := \frac{1}{n}\sum _{(x, y) \in \mathcal {D}_{train}} l(x, y, \theta ) = \frac{1}{n}\sum _{z_{j} \in \mathcal {D}_{train}} l(z_{j}, \theta )\), the parameters that minimized this empirical risk as \(\theta ^* := \mathrm{arg\,min}_{\theta } \mathcal {L(\theta )}\) and an estimator of \(\theta ^*\) is denoted \(\hat{\theta }\).

2 Similar Examples

In the XAI literature, similar examples, also referred to as factuals (see Fig. 2), are often used as a way to provide intuitive and interpretable explanations. The core idea is to retrieve the most similar, or the closest, elements in the training set to a sample under investigation \(z_{test}\) and to use them as a way to explain a model’s output. Specifically, Case-Based Reasoning (CBR) is of particular interest as it mimics the way humans draw upon past experiences to navigate novel situations [38, 100]. For example, when learning to play a new video game, individuals do not typically begin from a complete novice level. Instead, they rely on their pre-existing knowledge and skills in manipulating game controllers and draw upon past experiences with similar video games to adapt and apply strategies that have been successful in the past. As described by Aamodt and Plaza [1], a typical CBR cycle can be delineated by four fundamental procedures: i) RETRIEVE: Searching for the most analogous case or cases, ii) REUSE: Employing the information and expertise extracted from that case to address the problem, iii) REVISE: Modifying the proposed solution as necessary, iv) RETAIN: Preserving the pertinent aspects of this encounter that could be beneficial for future problem-solving endeavors. In addition to being intuitive, the cases retrieved by a CBR system for a given prediction are natural explanations for this output.

While CBR systems are a must-know in the XAI literature, we will not review them as they have already been well analyzed, reviewed, motivated, and described many times [26, 28, 102]. Instead, the focus here is on case-based explanations (CBE) [102]. CBE are methods that use CBR to explain other systems, also referred to as twin systems [57, 60]. In particular, explanations of the system under inspection are generally the outcomes of the RETRIEVE functionality of the twinned CBR system, which oftentimes relies on k-nearest neighbor (k-NN) retrieval [24]. The idea behind k-NN is to retrieve the k most similar training samples (cases) to a test sample \(z_{test}\).

2.1 Factual Methods

One of the main challenges with CBE methods is to define similarity. Indeed, there are many ways of defining similarity measures, and different approaches are appropriate for different representations of a training sample [28]. Generally, CBR systems assume that similar input features are likely to produce similar outcomes. Thus, using a distance metric defined on those input features engenders a similarity measure: the closer the more similar they are. One of the simplest is the unweighted Euclidean distance:

$$\begin{aligned} dist(z, z') = ||x - x'||_2 \quad | \quad z = (x, y) \in (\mathcal {X} \times \mathcal {Y}) \end{aligned}$$
(1)

However, wherei.e. in which space – the distance is computed does have major implications. As pointed out by Hanawa et al. [46], the input space does not seem to bring pieces of information on the internal working of the model under inspection but provides more of a data-centric analysis. Thus, recent methods rely instead on either computing the distance in a latent space or weighting features for the k-NN algorithm [31].

Computing distance in a latent space is one possibility to include the model in the similarity measure which is of utmost importance if we want to explain it, as pointed out by Caruana et al. [20]. Consequently, they suggested applying the Euclidean distance on the last hidden units \({h_{-1}}\) of a trained Deep Neural Network (DNN) as a similarity that considers the model’s predictions:

$$\begin{aligned} dist_{DNN}(z, z') = ||h_{-1}(x) - h_{-1}(x')||_2 \quad | \quad z = (x, y) \in (\mathcal {X} \times \mathcal {Y}) \end{aligned}$$
(2)

Similarly, for convolutional DNN, Papernot and McDaniel [92], and Sani et al. [98] suggested conducting the k-NN search in the latent representation of the network and using the cosine similarity distance.

Weighting features is another popular paradigm in CBE. For instance, Shin et al. [106] proposed various global weighting schemes – i.e. methods in which the weights assigned to each input’s feature remain constant across all samples as in Eq. (3) – where the weights are computed using the trained network to reveal the input features that were the most relevant for the network’s prediction.

$$\begin{aligned} dist_{features\_weights}(z, z') = ||w(\hat{\theta })^T(x-x')||_{2} \quad | \quad z = (x, y) \in (\mathcal {X} \times \mathcal {Y}) \end{aligned}$$
(3)

Alternatively, Park et al. [93] examined local weighting by considering varying feature weights across the instance space. However, their approach is not post-hoc for DNN. Besides, Nugent et al. [89] also focused on local weighting and proposed a method that can be applied to any black-box model. However, their method involves generating multiple synthetic datasets around a specific sample, which may not be suitable for explaining a large number of samples or high-dimensional inputs. In the same line of work, Kenny and Keane [60, 61] proposed COLE, by suggesting the direct k-NN search in the attribution space – i.e computing saliency maps [7, 107, 110] for all instances and performing a k-NN search in the resulting dataset of attributions. By denoting \(c(\hat{\theta }, z)\) the attribution map of the sample z for the model parameterized by \(\hat{\theta }\) gives:

$$\begin{aligned} dist_{COLE}(z, z') = ||c(\hat{\theta }, z) - c(\hat{\theta }, z')||_{2} \end{aligned}$$
(4)

They used three saliency map techniques [7, 107, 110] but nothing prevents one to leverage any other saliency map techniques. However, we should also point out that Fel et al. [34] questioned attribution methods’ ability to truly capture the internal process of DNN. Additionally in [61], Kenny and Keane proposed to use the Hadamard product of the gradient times the input features as a contribution score in the case of DNN with non-linear outputs.

2.2 Conclusions on Similar Examples

Presenting similar examples to an end-user as an explanation for a model’s outcomes has been shown through user studies [53, 114] and psychology [32] to be generally more convincing than other approaches. However, the current limitations of similarity-based XAI are still significant. For instance, computing a relevant distance between \(z_{test}\) and every training data point becomes computationally prohibitive for large datasets. Thankfully, there are efficient search techniques available, as mentioned in the paper by Bhatia et al. [14].

Furthermore, where the distance is computed does have major implications [46]. Consequently, authors have suggested different feature spaces or weighting schemes to investigate, but their relevance to reflect the inner workings of a model remains questionable. In addition, it is still unclear in the literature if one approach prevails over others. In this regard, it is relevant to point out that psychological studies [32, 78, 88, 112] underscore the importance of shared features, overall resemblance, context, and the interplay between perceptual and conceptual factors in similarity judgments. In fact, we can point out that none of the current factual methods leverage all those aspects at once.

Finally, considering the position of retrieved similar examples in relation to a model’s decision boundaries is crucial for relevant explanations. Neglecting this can confuse users if factual examples contradict the model’s prediction. Contrastive explanations address this issue and are discussed in Sect. 3.

3 Contrastive Explanations

Contrastive explanations are a class of explanation that provides the consequences of another plausible reality, the repercussion of changes in the model’s input [17, 113]. More simply, they are explanations where we modify the input and observe the reaction of the model’s prediction, the modified input is returned as the explanation and its meaning depends on the model’s prediction of it. Those methods are mainly post-hoc methods applied to classification models. This includes i) counterfactuals (CF): an imagined alternative to reality about the past, sometimes expressed as “if only ...” or “what if ...” [17], ii) semi-factuals (SF): an imagined alternative that results in the same outcome as reality, sometimes expressed as “even if ...” [17], and iii) adversarial examples (perturbations or attacks) (AP): inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence [41]. Examples of those three formats are provided in Fig. 2 from Kenny and Keane [62].

AP and CF are both perturbations with an expected change in the prediction, they only differ in the goal as CF attempt to provide an explanation of the model’s decision while AP are mainly used to evaluate robustness. In fact, AP can be considered CF [115], and for robust models, AP methods can generate interpretable CF [105]. Nonetheless, AP are hardly perceptible perturbations designed to fool the model [111], therefore, they are generative and those methods will not be further detailed in this work. Then, we can generalize SF and CF, with a given distance dist, and the examples conditioned space \(\mathcal {X}_{cond(f,x)} \subset \mathcal {X}\):

$$\begin{aligned} CF(x_{test}) := \mathop {\mathrm {arg\,min}}\limits _{x \in \mathcal {X}_{cond(f,x_{test})} | h(x) \ne h(x_{test})} dist(x_{test},x) \end{aligned}$$
(5)
$$\begin{aligned} SF(x_{test}) := \mathop {\mathrm {arg\,max}}\limits _{x \in \mathcal {X}_{cond(f,x_{test})} | h(x) = h(x_{test})} dist(x_{test},x) \end{aligned}$$
(6)

For natural CF and SF, the input space is conditioned to the training set, \(\mathcal {X}_{cond(f,x_{test})} = X_{train}\). While for AP, there is no condition on the input space, in Eq. (5), \(\mathcal {X}_{cond(f,x_{test})} = \mathcal {X}\). The distance and the condition of the input space are the key differences between CF and SF methods.

Fig. 2.
figure 2

Illustration of factuals, SF, and CF from Kenny and Keane [62]. The factual makes us understand the misclassification, while SF and CF show us how far or close the decision boundary is. Min-edit represents the AP, as differences are not visible.

This section discusses both counterfactuals and semi-factuals as they are often treated together in the literature [17, 25, 42, 62]. The literature for both formats is large in social sciences and in XAI for generative methods, hence we will extract key findings before presenting natural example-based methods.

3.1 Counterfactuals

The social science grounding of counterfactuals is deep, either in philosophy, or psychology. Indeed, the search for CF’s semantic definition goes back a long time [13, 44, 72], and historically revolves around the notion of cause and effect, sometimes called facts and foils [75, 79]. Then, Halpern and Pearl [44] argued that providing the cause of an event answers the question “Why?” and thus, provides a powerful explanation. Moreover, the philosophical literature argued that CF allow us to communicate and understand the causal relation between facts and foils [72, 79]. Psychology also possesses a rich literature regarding CF [17, 97], which has continued to evolve in recent years [18, 59, 80] thanks to the arrival of CF in XAI through Wachter et al. [115]. Humans’ natural use of counterfactuals in many situations was highlighted by Byrne [17]: From amusing fantasy to logical support, they explain the past, prepare the future, modulate emotional experience, and support moral judgments. Furthermore, when people encounter CF they have both the counterfactual and the factual in mind [18]. The insights from philosophy and psychology [18, 80] have shown the pertinence and potential of CF as well as SF for XAI. To match such promises, CF in XAI need to verify the definitions and properties of CF typically employed by humans.

Expected properties for natural CF can be extrapolated from conclusions and discovered properties in XAI for generated CF even though the literature on natural CF is slim. Such desirable properties for CF, derived from social sciences, could be summarized as follows: i) plausibility [58, 59, 113]: CF should be as realistic as possible; ii) validity [84]: if the model’s prediction on CF differ from the prediction on the query (see the definition (5)); iii) sparsity [58, 84, 113]: the number of features that were changed between CF and the query should be as little as possible; iv) diversity [54, 84]: if several CF are proposed, they should be different from each other; v) actionability [58, 113]: the method should allow the user to select features, to modify and specify immutable ones; vi) proximity [54, 58, 59, 84]: CF should be as close as possible to the query.

Counterfactuals Methods: Keane et al. [59] argued that nearest unlike neighbors (NUN) [27] a derivative of nearest neighbors [24], are the ancestors of counterfactuals in XAI. NUN are the nearest element to the query that belongs to a different class. They are natural CF when the class is given by the model prediction. Natural counterfactuals and semi-factuals are faced with the same discussions around similarity as factuals section. However, here, the similarity should take into account sparsity.

NUN were first used in XAI by Doyle et al. [29, 90] but not as an explanation, only to find SF. The only method to the best of our knowledge that uses NUN as explanations is KLEOR from Cummins and Bridge [25], they provided it as a complement to SF explanations to give intuition on the decision boundary. Nonetheless, they highlighted that the decision boundary might be much more complex than what the SF and CF pairs can reveal. Indeed, a line between SF and CF may intersect the decision boundary several times, which can lead to explanations that are not always faithful. Furthermore, Keane et al. [59] argued that “good natural counterfactuals are hard to find” as the dataset’s low density may prevent sparse and proximal natural CF.

Counterfactuals as known in XAI were introduced by Wachter et al. [115], and flourished through generative methods as shown by the numerous surveys [54, 84, 113]. Two periods emerge: one focused on interpretable tabular data [113], and the other on complex data like images [6, 62]. While generating plausible instances for the first period was not an issue it remains challenging for the second, even with diffusion models [6]. More research is needed to explore natural counterfactuals with their inherent plausibility [59, 113]. Moreover, adversarial perturbations proved that for non-robust DNN, a generated example close to a natural instance is not necessarily plausible.

To conclude on counterfactuals, their large literature produced expected properties with deep social science grounding. Such desiderata highlight the pros and cons between generative and natural CF. Indeed, for high dimensional data, the reader is faced with the choice of simple and plausible natural CF or proximal and sparse generated CF through a model explaining another model.

3.2 Semi-factuals

SF literature is most of the time included in the CF literature be it in philosophy [42], psychology [17], or XAI [25, 62]. In fact, SF, “even if ...” are semantically close to CF, “what if ...” [5, 13, 42], (see Eqs. (5) and (6)). However, psychology has demonstrated that human reactions differ between CF and SF. While CF strengthen the causal link between two elements, SF reduce it [18], CF increase fault and blame in a moral judgment while SF diminish it.

Expected properties for CF and SF were inspired by social science, hence, because of their close semantic definition, many properties are common between both: SF should also respect their definition in Eq. (6) (validity), then to make the comparison possible and relevant they should aim towards plausibility [5], sparsity [5], diversity, and actionability. Nonetheless, the psychological impact of CF and SF differ, hence there are also SF properties that contrast with CF properties. The difference between equations (5) and (6) – i.e. \(\mathop {\mathrm {arg\,min}}\limits \) vs \(\mathop {\mathrm {arg\,max}}\limits \) – suggests that to replace CF’s proximity, SF should be the farthest from the studied sample, while not crossing the decision boundary [25]. As such, we propose the decision boundary closeness as a necessary property, and a metric to evaluate it could be the distance between SF and SF’s NUN. Finally, SF should not go in any direction from the studied sample but aim toward the closest decision boundary. Therefore, it should be aligned with NUN [25, 29, 90], this property was not named, we suggest calling it counterfactual alignment.

Semi-factuals methods were first reviewed in XAI by a recent survey from Aryal and Keane [5]. They divided SF methods and history into four parts. The first three categories consist of one known method that will illustrate them:

  • SF based on feature-utility, Doyle et al. [29] discovered that similar examples may not be the best explanations and suggested giving examples farther from the studied sample. To find the best explanation case, dist in Eq. (6) is a utility evaluation based on features difference.

  • NUN-related SF, Cummins and Bridge [25] proposed KLEOR where Eq. (6)’s dist is based on NUN similarity. Then, they penalize this distance to make sure the SF are between the query and nearest unlike neighbors.

  • SF near local-region boundaries, Nugent et al. [90] approximate the decision boundary of the model in the neighborhood of the studied sample through input perturbations (like LIME [96]). Then SF are given by the points that are the closest to the decision boundary.

  • The modern era: post-2020 methods, inspired by CF methods, many generative methods emerged in recent years [55, 62].

To conclude, semi-factuals are a natural evolution of factuals. Moreover, their complementarity with counterfactuals was exposed through the literature, first to find and evaluate SF, then to provide a range to the decision boundary. Finally, generative and natural SF possess the same pros and cons as CF ones.

Even though contrastive explanations bring insights into a model’s behavior, it has no impact on the current model situation, what led to this state, or how to change it. Contrastively, influential instances (see Sect. 4) extract the samples with the most influence on the model’s training. Removing such samples from the training set will have a huge impact on the resulting model.

4 Influential Examples

Influential instances could be defined as instances more likely to change a model’s outcome if they were not in the training dataset. Furthermore, such measures of influence provide one with information on “in which direction” the model decision would have been affected if that point was removed. Being able to trace back to the most influential training samples for a given test sample \(z_{test}\) has been a topic of interest mainly for example-based XAI.

4.1 Influential Instances Methods

Influence functions originated from robust statistics in the early 70 s. In essence, they evaluate the change of a model’s parameters as we up-weight a training sample by an infinitesimal amount [45]: \(\hat{\theta }_{\epsilon , z_j} := \mathrm{arg\,min}_{\theta } \mathcal {L}(\theta ) + \epsilon l(z_j, \theta )\). One way to estimate the change in a model’s parameters of a single training sample would be to perform Leave-One-Out (LOO) retraining, that is, to train the model again with the sample of interest being held out of the training dataset. However, repeatedly re-training the model to exactly retrieve the parameters’ changes could be computationally prohibitive, especially when the dataset size and/or the number of parameters grows. As removing a sample \(z_j\) can be linearly approximated by up-weighting it by \(\epsilon = -\frac{1}{n}\), computing influence helps to estimate the change of a model’s parameters if a specific training point was removed. Thus, by making the assumption that the empirical risk \(\mathcal {L}\) is twice-differentiable and strictly convex with respect to the model’s parameters \(\theta \) making the Hessian \(H_{\hat{\theta }} := \frac{1}{n}\sum _{z_{i} \in \mathcal {D}_{train}}\nabla ^{2}_{\theta }l(z_i, \hat{\theta })\) positive definite, Cook and Weisberg [23] proposed to compute the influence of \(z_j\) on the parameters \(\hat{\theta }\) as:

$$\begin{aligned} \mathcal {I}(z_j) := -H_{\hat{\theta }}^{-1}\nabla _{\theta }l(z_j,\hat{\theta }) \end{aligned}$$
(7)

Later, Koh and Liang [68] popularized influence functions in the machine learning community as they took advantage of auto-differentiation frameworks to efficiently compute the hessian for DNN and derived Eq. (7) to formulate the influence of up-weighting a training sample \(z_j\) on the loss at a test point \(z_{test}\):

$$\begin{aligned} \textrm{IF}(z_j, z_{test}) := -\nabla _{\theta }l(z_{test}, \hat{\theta })^{T} H_{\hat{\theta }}^{-1}\nabla _{\theta }l(z_j,\hat{\theta }) \end{aligned}$$
(8)

This formulation opens its way into example-based XAI as it compares to the study of finding the nearest neighbors of \(z_{test}\) in the training dataset – i.e. the most similar examples (Sect. 2) – with two major differences though: i) points with high training loss are given more influence revealing that outliers can dominate the model parameters [68], and ii) \(H_{\hat{\theta }}^{-1}\) measures what Koh and Liang called: the resistance of the other training points to the removal of \(z_j\) [68]. However, it should be noted that hessian computation remains a significant challenge, that could be alleviated with common techniques [3, 77, 101]. By normalizing Eq. (8), Barshan et al. [10] further added stability to the formulation.

Oftentimes, we are not only interested in individual instance influence but in the influence of a group of training samples (e.g. mini-batch effect, multi-source data, etc.). Koh et al. [69] suggested that using the sum of individual influences as the influence of the group constitutes a good proxy to rank those groups in terms of influence. Basu et al. [12] on their side suggested using a second-order approximation to capture possible cross-correlations but they specified it is most likely impracticable for DNN. In a later work, Basu et al. [11] concluded that influence function estimates for DNN are fragile as the assumptions on which they rely, being near optimality and convexity, do not hold in general for DNN.

LOO approximation is one of the previously mentioned motivations behind influence estimates as it avoids the prohibitive LOO retraining required for every sample in the training data. Thus, some authors proposed approaches that optimize the number of LOO retraining necessary to get a grasp on a sample’s influence such as Feldman and Zhang [36]. Although this significantly reduces the number of retraining compared to naive LOO retraining, it still requires a significant amount of them. Recently, a new approach that relates to influence functions and involves training many models, was introduced with data models [52, 99] which we do not review here.

As Basu et al. [11] pointed out, there is a discrepancy between LOO approximation and influence function estimates, especially for DNN. However, Bae et al. [9] claimed that this discrepancy is due to influence functions approaching what they call the proximal Bregman response function (PBRF), rather than approximating the LOO retraining, which does not interfere with their ability to perform the task they were thought for, especially XAI. Thus, they suggested evaluating the quality of influence estimates by comparing them to the PBRF rather than LOO retraining as it was done until now.

Influence computation that relies on kernels is another paradigm to find the training examples that are the most responsible for a given set of predictions. For instance, Khanna et al. [63] proposed an approach that relies on Fisher’s kernels and they related it to the one from Koh and Liang [68] as a generalization of the latter under certain assumptions. Yeh et al. [117] also suggested an approach that leverages kernels but this time they relied on the representer theorem [103]. That allows them to focus on explaining only the pre-activation prediction layer of a DNN for classification tasks. In addition, their influence scores, called representer values, provide supplementary information, with positive representer values being excitatory and negative values being inhibitory. However, this approach requires introducing an L2 regularizer during optimization, which can prevent post-hoc analysis if not responsible for training. Additionally, Sui et al. [109] argued that this approach provides more of a class-level explanation rather than an instance-level explanation. To address this issue and the L2 regularizer problem, they proposed a method that involves hessian computation on the classification layer, with only the associated computational cost. However, the ability to retrieve relevant samples when investigating only the final prediction layer was questioned by Feldmann and Zhang [36], who found that memorization does not occur in the last layer.

Tracing the training process has been another research field to compute influence scores. It relies on the possibility to replay the training process by saving some checkpoints of our model parameters, or states, and reloading them in a post-hoc fashion [22, 47, 95]. In contrast to the previous approaches, they rely neither on being near optimality nor being strongly convex, which is more realistic when we consider the reality of DNN. However, they require handling the training procedure to save the different checkpoints, potentially numerous, hence they are intrinsic methods, which in practice is not always feasible.

4.2 Conclusions on Influential Instances

Influential techniques can provide both global and local explanations to enhance model performance. Global explanations allow for the identification of training samples that significantly shape decision boundaries or outliers (see Fig. 1), aiding in data curation. On the other hand, local explanations offer guidance for altering the model in a desired way (see Fig. 3). Although they have been compared to similar examples and have been shown to be more relevant to the model [46], they are more challenging to interpret and their effectiveness for trustworthiness is unclear. Further research, particularly user studies, is necessary to determine their ability to take advantage of human cognitive processes.

Fig. 3.
figure 3

Figure taken from F. Liu [95]: A tracing process for estimating influence, \(\text {TracIn}\), applied on ImageNet. The first column is composed of the test sample, the next three columns display the training examples that have the most positive value of influence score while the last three columns point out the training examples with the most negative values of influence score. (fr-bulldog: french-bulldog)

5 Prototypes

Prototypes are a set of representative data instances from the dataset, while criticisms are data instances that are not well represented by those prototypes [64]. Figure 4 shows examples of prototypes and criticisms from Imagenet dataset.

Fig. 4.
figure 4

Figure taken from [64]: Learned prototypes and criticisms from Imagenet dataset (two types of dog breeds)

5.1 Prototype Methods

Prototypes and criticism can be used to add data-centric interpretability, post-hoc interpretability, or to build an interpretable model [83]. The data-centric approaches will be briefly introduced.

Prototypes for Data-Centric Interpretability: Clustering algorithms that return actual data points as cluster centers such as k-medoids methods [56, 87] could be used to better understand the data distribution. We can consider the cluster centers as prototypes.

The abundance of large datasets has renewed the interest in the data summarization methods [8, 73, 74, 82, 108], which consist of finding a small subset of data points that covers a large dataset. The subset elements can be considered prototypes. Additionally, we found data summarization methods based on the Maximum Mean Discrepancy (MMD), such as MMD-critic [64] and Protodash [43], that learn both prototypes and criticisms.

Prototypes for Post-hoc Interpretability: Most prototype methods are data-centric that provide no information on the model. However, such methods can be computed in a meaningful search space for the model as done with similar examples Sect. 2.1 and give global explanations with the model vision of the dataset. Similarly, local explanations can be extracted by comparing studied samples to the closest prototypes in the search space. But to our knowledge, only one method explores such a possibility. Filho et al. [37] proposed M-PEER (Multiobjective Prototype-based Explanation for Regression) method that finds the prototypes using both the training data and the model output. It optimizes the error of the explainable model and the fidelity and interpretability metrics.

Prototype-Based Models Interpretable by Design: After data-centric and post-hoc methods, there are methods that construct prototype-based models. Those models are interpretable by design because they provide a set of prototypes that make sense for the model, those methods are mainly designed for classification. An interpretable classifier learns a set of prototypes \(P_c \subseteq \{(x,y) \in \mathcal {D}_{train} | y = c\}\). Each \(P_c\) captures the full variability of class c while avoiding confusion with other classes. The learned prototypes are then used by the model to classify the input. We identified three types of prototype-based classifiers:

  • Classifiers resolving set cover problems select convex sets that cover each class with prototypes to represent it. Various types of convex sets such as boxes, balls, convex hulls, and ellipsoids can be used. Class Cover Catch Digraphs (CCCD) [76] and ProtoSelect [15] used balls where the centers were considered prototypes. Then, the nearest-prototype rule is used to classify the data points. CCCD finds, for each class c, a variable number of balls that cover all points of class c and no points of other classes. Its radius is chosen as large as possible. However, even within large classes, there can still be a lot of interesting within-class variability that should be taken into account when selecting the prototypes. To overcome this limitation, ProtoSelect used a fixed radius across all points, to allow the selection of multiple prototypes for large classes, and they also allow wrongly covered and non-covered points. They simultaneously minimize three elements: i) the number of prototypes; ii) the number of uncovered points; iii) the number of wrongly covered points.

  • Classifiers using Bayesian models for explanation, Kim et al. [65] proposed the Bayesian Case Model (BCM) that extends Latent Dirichlet Allocation [16]. In BCM, the idea is to divide the data into s clusters. For each cluster, a prototype is defined as the sample that maximizes the subspace indicators that characterize the cluster. When a sample is given to BCM, this last one yield a vector of probability to belong to each of the s clusters which can be used for classification. Thus, the classifier uses as an input a vector of dimension s, which allows the use of simpler models due to dimensionality reduction. In addition, the prototype of the most likely cluster can then be used as an explanation.

  • Classifiers based on neural networks learn to select prototypes defined in the latent space, which are used for the classification. This lead to a model that is more interpretable than a standard neural network since the reasoning process behind each prediction is “transparent”. Learning Vector Quantization (LVQ) [70] is widely used for generating prototypes as weights in a neural network. However, the use of generated prototypes reduces their interpretability. ProtoPNet [21] also stocks prototypes as weights and trains them, but projects them to training samples patches representation during training. Given an input image, its patches are compared to each prototype, the resulting similarity scores are then multiplied by the learned class connections of each prototype. ProtoPNet has been extended to time series data via ProSeNet [81], or with a more interpretable structure with ProtoTree [86] and HPNet [48]. Instead of using linear bag-of-prototypes, ProtoTree and HPNet used hierarchically organized prototypes to classify images. ProtoTree improves upon ProtoPNet by using a decision tree which provides an easy-to-interpret global explanation and can be used to locally explain a single prediction. Each node in this tree contains a prototype (as defined by ProtoPNet) and the similarity scores between image patches and the prototypes are used to determine the routing through the tree. Decision-making is therefore similar to human reasoning [86]. Nauta et al. [85] proposed a method called “This Looks Like That, Because” to understand prototypes similarities. This method allows checking why the model considered two examples as similar. For instance, it is possible that a human thinks that the common point between two examples is their color, while the model uses their shape. The method modifies some characteristics of the input image, such as hue, or shape, to observe how the similarity score changes. This allows us to measure the importance of each of these characteristics.

5.2 Conclusions on Prototypes

Most prototype methods are data-centric, but we have seen that applying such methods in a meaningful space for the model can bring post-hoc global and local explanations. Nonetheless, a second part of the literature constructs prototype-based classifiers explainable by design, those methods are promising and produce models with natural reasoning but adapting a new model to such architecture can be prohibitive.

6 Concept-Based XAI

Prototype-based models compare prototypical parts, e.g. patches, and the studied sample to make the classification. The idea of parts is not new to the literature, the part-based explanation field, developed for fine-grained classification, is able to detect semantically significant parts of images. The first part-based model required labeled parts for training and can be considered object detection with a semantic link between the detected objects. Afterward, unsupervised methods such as OPAM [94] or Particul [116] emerged, those methods still learned classification in a supervised fashion, but no labels were necessary for part identification. In fact, the explanation provided by this kind of method can be assimilated into concept-based explanations. A concept is an abstraction of common elements between samples, as an example Fig. 5 shows the visualization of six different concepts that the CRAFT method [35] associated with the given image. To understand parts or concepts, the method uses examples and supposes that with a few examples, humans are able to identify the concept.

6.1 Concepts Methods

Like in part-based XAI, the first concept-based method used labeled concepts. Kim et al. [66] introduced concept activation vectors (CAV) to represent concepts using a model latent space representation of images. Then, they design a post-hoc method, TCAV [66] based on CAV to evaluate an image correspondence to a given concept. Even though it seems promising, this method requires prior knowledge of the relevant concepts, along with a labeled dataset of the associated concepts, which is costly and prone to human biases.

Fig. 5.
figure 5

Illustration from Fel et al. [35]. Natural examples in the colored boxes define a concept. Purple box: could define the concept of “chainsaw”. Blue box: could define the concept of “saw’s motor”. Red box: could define the concept of “jeans”. (Color figure online)

Fortunately, recent works have been conducted to automate the concept discovery in the training dataset without humans in the loop. For instance, ACE, proposed by Ghobarni et al. [39], employs a semantic segmentation technique on images belonging to a specific class of interest and use an Inception-V3 neural network to compute activations of an intermediate model layer for these segments. The resulting activations are then clustered to form a set of prototypes, which they refer to as “concepts”. However, the presence of background segments in these concepts requires a post-processing clean-up step to remove irrelevant and outlier concepts. Zhang et al. [118] proposed an alternative approach to solving the unsupervised concept discovery problem through matrix factorizations [71] in the networks’ latent spaces. However, such methods operate at the convolutional kernel level, which may lead to concepts based on shape and/or ignore more abstract concepts.

As an answer, Fel et al. [35] proposed CRAFT, which uses Non-Negative Matrix Factorization [71] for concept discovery. In addition to filling in the blank of previous approaches, their method provides an explicit link between the concepts’ global and local explanations (Fig. 5). While their approach alleviates the previously mentioned issues, the retrieved concepts are not always interpretable. Nonetheless, their user study proved the pertinence of the method.

6.2 Conclusions on Concepts

Concept-based explanations allow post-hoc global and local explanations, by understanding the general concepts associated with a given class and the concepts used for a decision. We draw attention to methods that do not require expert knowledge to find out relevant concepts as they are prone to human bias. Even though automated concept discovery is making tremendous progress, the interpretation of such concepts and their ability to gain users’ trust stay questionable as very few user studies have been conducted on the subject.

7 Conclusions and Discussions

This paper explored explainability literature about natural example-based explainability and provided a general social science justification for example-based XAI. We described each kind of explanation possible through samples. For each possibility, we reviewed what explanation they bring, then classified and presented the major methods. We summarize all explored methods in Table 1. We saw that all those methods are based on a notion of similarity. As such, for them to explain the model, the similarity between instances should take into account the model. There are two ways of doing it: project the instances in a meaningful space for the model and/or weight instances. Hence, similarity definitions from factuals (Sect. 2.1) can be ported to other formats and social science groundings could also be shared. However, if the training data is sparse in the search space, finding cases with good properties for a given format may be challenging.

Table 1. Comparison table between the different natural example-based formats and methods. NA: Not applicable, FGCV: Fine-grained computer vision

Among the formats, contrastive explanations, prototypes, and concept examples can be generated, which brings competition to non-generative methods. We argue that both generative and natural examples have their pros and cons. Indeed, natural examples are simple to compute and ensure plausibility while generated examples can be more proximal and sparse but require a model to explain another model (see Sect. 3.1 for properties definitions).

We have illustrated that the different example-based formats bring different kinds of explanations, and each one has its own advantages, Fig. 1 shows their diversity, have their scope of application, and complementarity. To summarize those advantages non-exhaustively: i) Factuals give confidence in the decisions of the model and are pertinent in AI-assisted decisions. ii) For classification, contrastive explanations give local insight into the decision boundary. iii) Influential instances explain how samples influenced the model training. iv) Prototypes and concepts give information on the whole model behavior, but may also be used to explain decisions. Nonetheless, like all explanations, we cannot be sure that humans will have a correct understanding of the model or the decision. Furthermore, there is no consensus on how to ensure a given method indeed explains the decisions or inner workings of the model. Moreover, for example-based explainability, the data is used as an explanation, hence, without profound knowledge of the dataset, humans will not be able to draw conclusions through such explanations. Therefore, the evaluation of example-based methods should always include a user study, which are scarce in this field and in XAI in general, especially with the lack of availability and consensus around quantitative metrics to evaluate example-based explanations. Finally, we hope our work will motivate, facilitate and help researchers to keep on developing the field of XAI and in particular, natural example-based XAI and to address the identified challenges.