Keywords

2.1 Introduction

Artificial Intelligence is nowadays one of the most important scientific and technological areas, with a tremendous socio-economic impact and a pervasive adoption in every field of the modern society. High-profile applications such as medical diagnosis, spam filtering, autonomous vehicles, voice assistants, and image recognition are based on Artificial Intelligence (AI) systems. These AI systems reach their impressive performance mainly through obscure machine learning models that “hide” the logic of their internal decision processes to humans because they are not humanly understandable. Black box models are tools used by AI to accomplish a task for which either the logic of the decision process is not accessible or it is accessible but not human-understandable. Examples of machine learning black box models adopted by AI systems include neural networks, deep neural networks, ensemble classifiers, SVMs, but also compositions of expert systems, data mining, and hard-coded proprietary software. The choice of using not interpretable machine learning models in AI systems is due to their high performance in terms of accuracy [71]. As a consequence, we have witnessed the rise of a black box society [54], where AI systems adopt obscure decision-making models to carry on their decision processes.

The missing of interpretability on how black box models make decisions and fulfill their tasks is a crucial issue for ethics and a limitation to AI adoption in socially sensitive and safety-critical contexts such as healthcare and law. Also, the problem is not only for lack of transparency but also for possible biases inherited by the black boxes from prejudices and artifacts hidden in the training data used by the obscure machine learning models. Indeed, machine learning models are built through a learning phase on training data. These training datasets can contain data coming from the digital traces that people produce while performing daily activities such as purchases, movements, posts in social networks, etc., but also from logs and reports generated in business companies and industries. These “Big Data” can inadvertently contain bias, prejudices, or spurious correlations due to human annotation or the way they are collected and cleaned. Thus, obscure biased models may inherit such biases, possibly causing wrong and unfair decisions. As a consequence, the research in eXplainable AI (XAI) has recently caught much attention [1, 7, 32, 49].

The rest of this chapter is organized as follows. First, Sect. 2.2 shows theoretical, ethical, and legal motivations for the need of an explainable AI. Section 2.3 illustrates the dimensions to distinguish XAI approaches. Then, Sect. 2.4 presents the most common types of explanations and provides some details on the state-of-the-art explanators returning them. Finally, Sect. 2.5 concludes this chapter by discussing practical usability of XAI methods, explanations in real-world applications, and the open research questions.

2.2 Motivations for XAI

Why do we need XAI? In the following, we analyze some real cases depicting how and why AI equipped with black box models can be dangerous both for the possibility of discrimination and for the unavailability of justification after incorrect behaviors.

Prejudices and preconceptions on training datasets can be adopted by machine learning classifiers as general rules to be replicated [56]. Automated discrimination is not necessarily due to black box models. In St. George’s Hospital Medical School, London, UK, a program for screening job applicants was used during the 1970s and 1980. The program used information from candidates without any reference to ethnicity. However, such a program was found to discriminate against ethnic minorities and women by inferring this information from surnames and place of birth and lowering their chances of being selected for interview [44]. A more recent example is related to Amazon. In 2016, the AI software used by Amazon to determine the areas of the USA to which Amazon would offer free same-day delivery accidentally restricted minority neighborhoods from participating in the program (often when every surrounding neighborhood was allowed).Footnote 1 In the same year, the COMPAS score, a predictive model for the “risk of crime recidivism” (proprietary secret of Northpointe), was shown to have a strong ethnic bias from the journalists of propublica.org.Footnote 2 The journalists proved that, according to the COMPAS score, a Black who did not re-offend was classified as “high risk” twice as much as Whites who did not re-offend. On the other hand, White repeat offenders were classified as “low risk” twice as much as Black repeat offenders.

These kinds of biases are tied with the training data. For example, in [15], it is proved that the word embeddings [11] trained on Google News articles exhibit female/male gender stereotypes. Indeed, it is shown that for the analogy “Man is to computer programmer as woman is to X,” the variable X was replaced by “homemaker” by the trained obscure model. The problem was the literature and texts used to train the model repeating that a woman does the housework. Similarly, in [58], it is shown that a classifier trained to recognize wolves and husky dogs was basing its predictions to distinguish a wolf solely on the presence of snow in the background. This was happening because all the training images with a wolf had snow in the background. These spurious correlations, biases, and implicit rules, hidden in the data, besides discriminating, can also cause wrong and unfair decisions. Unfortunately, in various cases, machine errors could have been avoided if the AI would not have been obscured. In particular, accessing the reasons for the AI decisions is especially crucial in safety-critical AI systems like medicine and self-driving cars, where a possible erroneous outcome could even lead to the death of people. For example, the incident that involved a self-driving Uber car that knocked down and killed a pedestrian in Tempe, Arizona, in 2018.Footnote 3 An appropriate XAI method would have helped the company to understand the reasons behind the decision and manage their responsibilities.

Precisely to avoid these cases, the European Parliament turned into law in May 2018 the General Data Protection Regulation (GDPR) containing innovative clauses on interpretability for automated decision-making systems. For the first time, the GDPR introduces a right of explanation for all individuals to obtain “meaningful explanations of the logic involved” when automated decision-making takes place. Despite conflicting opinions among legal scholars regarding the real scope of these clauses [27, 47, 73], a joint agreement on the need for the implementation of such a principle is crucial, and it is nowadays a big open scientific challenge. Indeed, without a technology able to explain black box models, the right to explanation will remain a “dead letter.” How can companies trust their AI products without fully validating and understanding the rationale of their obscure models? And in turn, how can users trust AI services and applications? It would be unthinkable to increase the trust of people and companies in AI without explaining to humans the logic followed by black box models. For these reasons, XAI is at the heart of responsible, open data science across multiple industry sectors and scientific disciplines involving robotics, economics, sociology, and psychology besides computer science.

2.3 Dimensions of XAI

The goal of XAI is to interpret AI reasoning. To interpret means to give or provide the meaning or to explain and present in understandable terms some concepts.Footnote 4 Therefore, in AI, interpretability is defined as the ability to explain or to provide the meaning in understandable terms to a human [7, 21]. These definitions assume that the concepts composing an explanation and expressed in the understandable terms are self-contained and do not need further explanations. An explanation is an “interface” between a human and an AI, and it is at the same time both human-understandable and an accurate proxy of the AI.

We can identify a set of dimensions to analyze the interpretability of AI systems that, in turn, reflect on the existing different types of explanations [32]. Some of these dimensions are related to functional requirements of explainable Artificial Intelligence, i.e., requirements that identify the algorithmic adequacy of a particular approach for a specific application, while others to the operational requirements, i.e., requirements that take into consideration how users interact with an explainable system and what is the expectation. Some dimensions instead derive from the need of usability criteria from a user perspective, while others derive from the need of guarantees against any vulnerability issues. Recently, all these requirements have been analyzed [68] to provide a framework allowing the systematic comparison of explainability methods. In particular, in [68], the authors propose Explainability Fact Sheets that enable researchers and practitioners to assess capabilities and limitations of a particular explainable method. As an example, given an explanation method m, we can consider the following functional requirements. (i) Even though m is designed to explain regressors, can we use it to explain probabilistic classifiers? (ii) Can we employ m on categorical features even though it only works on numerical features? On the other hand, as an operational requirement, can we consider which is the function of the explanation? Provide transparency, assess the fairness, etc.

Besides the detailed requirements illustrated in [68], in the literature, it is recognized as a categorization of explanation methods among fundamental pillars [1, 32]: (i) black box explanation vs. explanation by design, (ii) global vs. local explanations, and (iii) model-specific vs. model-agnostic explainers. In the following, we present details of these distinctions and other important features characterizing XAI methods. Figure 2.1 illustrates a summarized ontology of the taxonomy used to classify XAI methods.

Fig. 2.1
figure 1

A summarized ontology of the taxonomy of XAI methods

Black Box Explanation vs. Explanation by Design

We distinguish between black box explanation and explanation by design. In the first case, the idea is to couple an AI with a black box model with an explanation method able to interpret the black box decisions. In the second case, the strategy is to substitute the obscure model with a transparent model in which the decision process is accessible by design. Figure 2.2 depicts this distinction. Starting from a dataset X, the black box explanation idea is to maintain the high performance of the obscure model b used by the AI and to use an explanation method f to retrieve an explanation e by reasoning over b and X. This kind of approach is the one more addressed nowadays in the XAI research field [20, 45, 58]. On the other hand, the explanation by design consists of directly designing a comprehensible model c over the dataset X, which is interpretable by design and returns an explanation e besides the prediction y. Thus, the idea is to use this transparent model directly into the AI system [61, 62]. In the literature, there are various models recognized to be interpretable. Examples are decision tree, decision rules, and linear models [24]. These models are considered easily understandable and interpretable for humans. However, nearly all of them sacrifice performance in favor of interpretability. In addition, they cannot be applied effectively on data types such as images or text but only on tabular, relational data, i.e., tables.

Fig. 2.2
figure 2

(Top) Black box explanation pipeline. (Bottom) Explanation by design pipeline

Global vs. Local Explanations

We distinguish between global and local explanations depending on whether the explanation allows understanding the whole logic of a model used by an AI system or if the explanation refers to a specific case, i.e., only a single decision is interpretable. A global explanation consists in providing a way for interpreting any possible decision of a black box model. Generally, the black box behavior is approximated with a transparent model trained to mimic the obscure model and also to be human-understandable. In other words, the interpretable model approximating the black box provides a global interpretation. Global explanations are quite difficult to achieve and, up to now, can be provided only for AI working on relational data. A local explanation consists in retrieving the reasons for the outcome returned by a black box model relatively to the decision for a specific instance. In this case, it is not required to explain the whole logic underlying the AI, but only the reason for the prediction on a specific input instance. Hence, an interpretable model is used to approximate the black box behavior only in the “neighborhood” of the instance analyzed, i.e., with respect only to similar instances. The idea is that in such a neighborhood, it is easier to approximate the AI with a simple and understandable model. Regarding Fig. 2.2 (top), a global explanation method f uses many instances X over which the explanation is returned. Figure 2.3 (left) illustrates an example of global explanation e obtained by a decision tree structure for a classifier recommending to play tennis or not. The overall decision logic is captured by the tree that says that the classifier recommends playing tennis or not by first looking at the Outlook feature. If its value is Overcast, then the prediction is “not to play.” Otherwise, if its value is Sunny, the classifier checks the Humidity feature and recommends “not to play” if the Humidity is High and “to play” if it is Normal. The same reasoning applies to the other branches of the tree. Still with reference to Fig. 2.2 (top), a local explanation method f returns an explanation only for a single instance x. Two examples of local explanations are shown in Fig. 2.3 (right). The local rule-based explanation e for a given record x says that the black box b suggested to play tennis because the Outlook is Sunny and the Humidity is Normal. On the other hand, the explanation e formed by feature importance says that the black box b suggested playing tennis because the Outlook has a large positive contribution, Humidity has a consistent negative contribution, and Wind has no contribution in the decision.

Fig. 2.3
figure 3

Explanation examples in the form of decision tree, decision rule, and feature importance

Interpretable Models for Explaining AI

To explain obscure AI systems or to replace the black box components, often interpretable models are learned. The most largely adopted interpretable models are briefly described in the following. A decision tree exploits a graph-structure like a tree and composed of internal nodes representing tests on features or attributes (e.g., whether a variable has a value lower than, equal to, or greater than a threshold) and leaf nodes representing a decision. Each branch represents a possible outcome [57]. The paths from the root to the leaves represent the classification rules. The most common rules are if-then rules, where the “if” clause is a combination of conditions on the input variables. If the clause is verified, the “then” part reveals the AI action. For a list of rules, given an ordered set of rules, the AI returns as the decision the consequent of the first rule that is verified [76]. Finally, linear models allow visualizing the feature importance: both the sign and the magnitude of the contribution of the attributes for a given prediction [58]. If the sign of an attribute value is positive, then it contributes by increasing the model’s output, and otherwise, it decreases it. Higher magnitudes of attribute values indicate a higher influence on the prediction of the model. Examples of such explanations are illustrated in Fig. 2.3.

User’s Desiderata

Since interpretable models are required to retrieve explanations, some desiderata should be taken into account when adopting them [24]. Interpretability consists of evaluating to what extent a given explanation is human-understandable. An approach often used for measuring the interpretability is the complexity of the interpretable surrogate model. The complexity is generally estimated with a rough approximation related to the size of the interpretable model. For example, the complexity of a rule can be measured with the number of clauses in the condition; for linear models, it is possible to count the number of non-zero weights, while for decision trees the depth of the tree.

Fidelity consists in evaluating to which extent the interpretable model is able to accurately imitate, either globally or locally, the decision of the AI. The fidelity can be practically measured in terms of accuracy score, F1-score, etc. [71] with respect to the decisions taken by the black box model. The fidelity has the goal to determine the soundness and completeness of explanations.

Another important property for the user’s point view is the usability: an interactive explanation can be more useful than a textual and static explanation. However, machine learning models should also have other ordinary important requirements such as reliability [42], robustness [34], causality [28], scalability, and generality [55]. Reliability and robustness request that an explanation method should have the ability to maintain certain levels of performance independently from small variations of the parameters or of the input. Causality assumes that controlled changes in the input affect the black box behavior in an expected way, known by the explainer. Generality requires that explanation models are portable to different data (with similar nature) without special constraints or restrictions. Finally, since most of the AI systems need “Big Data,” it is opportune to have explainers able to scale to large input data.

Moreover, a fundamental aspect is that every explanation should be personalized coherently with the user background. Different background knowledge and diverse experiences in various tasks are tied to different notions and requirements for the usage of explanations. Domain experts can be able to understand complex explanations, while common users require simple and effective clarifications. Indeed, the meaningfulness and usefulness of an explanation depend on the stakeholder [12]. Taking as an example the aforementioned COMPAS case, a specific explanation for a score may make sense to a judge who wants to understand and double-check the suggestion of the AI support system and possibly discover that it biased against Blacks. On the other hand, the same explanation is not useful to a prisoner who cannot change the reality of being Black, while he can find the suggestion meaningful that when he will be older then he would lower his risk down. Moreover, besides these features strictly related to XAI, an interpretable model should satisfy other important general desiderata. For instance, having a high accuracy that consists in evaluating to what extent the model accurately takes decisions for unseen instances.

Model-Specific vs. Model-Agnostic Explainers

We distinguish between model-specific and model-agnostic explanation methods depending on whether the technique adopted to retrieve the explanation acts on a particular model adopted by an AI system or can be used on any type of AI. The most used approach to explain AI black boxes is known as reverse engineering. The name comes from the fact that the explanation is retrieved by observing what happens to the output, i.e., the AI decision, when changing the input in a controlled way. An explanation method is model-specific or not generalizable [48], if it can be used to interpret only particular types of black box models. For example, if an explanation approach is designed to interpret a random forest [71] and internally use a concept of distance between trees, then such an approach cannot be used to explain the predictions of a neural network. On the other hand, an explanation method is model-agnostic, or generalizable, when it can be used independently from the black box model being explained. In other words, the AI’s internal characteristics are not exploited to build the interpretable model approximating the black box behavior.

Time Limitations

The time that the user is allowed to spend on understanding an explanation or is available to do it is a crucial aspect. Obviously, the time availability of a user is strictly related to the scenario where the predictive model has to be used. In some contexts where the user needs to quickly take the decision, e.g., a surgery or an imminent disaster, it is preferable to have an explanation simple and effective. While in contexts where the decision time is not a constraint, such as during a procedure to release a loan, one might prefer a more complex and exhaustive explanation.

Safety Desiderata

Explainability methods providing interpretable understanding may reveal partial information about the training data, the internal mechanisms of the models, or their parameters and prediction boundaries [14, 65]. Thus, desiderata such as privacy [52], secrecy, security, and fairness [56] should be considered to avoid skepticism and increase trust. Fairness and privacy are fundamental desiderata to guarantee the protection of groups against (direct or indirect) discrimination [60] that the interpretable model does not reveal sensitive information about people [3].

2.4 Explanations and Explanators

Increasing research on XAI is bringing to light a wide list of explanations and explanation methods for “opening” black box models. The explanations returned depend on various factors such as the type of task they are needed for and on which type of data the AI system acts, who is the final user of the explanation, if they allow to explain the whole behavior of the AI system (global explanations) or reveal the reasons for the decision only for a specific instance (local explanations). In this section, we review the most used types of explanations and show how some state-of-the-art explanation methods are able to return them. The interest reader can refer to [1, 32] for a complete review of the literature in XAI.

2.4.1 Single Tree Approximation

One of the first approaches introduced to explain neural networks is trepan [20]. trepan is a global explanation method that is able to model the whole logic of a neural network working on tabular with a single decision tree. The decision tree returned by trepan as explanation is a global transparent surrogate. Indeed, every path from the root of the tree to a leaf shows the reasons for the final decision reported in the leaf. An example of a decision tree returned by trepan is illustrated in Fig. 2.4. This global explanation reveals that the black box first focuses on the value of the feature rest ECG and depending on its degree (abnormal, normal, hypertrophy) takes different decisions depending on additional factors such as sex or cholesterol. In particular, trepan queries the neural network to induce the decision tree by maximizing the gain ratio [71] on the data with respect to the predictions of the neural network. A weakness of common decision trees like ID3 or C4.5 [57] is that the amount of data to find the splits near to the leaves is much lower than those used at the beginning. Thus, in order to retrieve how a neural network works in detail, trepan adopts a synthetic generation of data that respect the path of each node before performing the splitting such that the same amount of data is used for every split. In addition, it allows flexibility by using “m-of-n” rules where only m conditions out of n are required to be satisfied to classify a record. Therefore, trepan maximizes the fidelity of the single tree explanation with respect to the black box decision. We highlight that even though trepan is proposed to explain neural networks, in reality it is model-agnostic because it does not exploit any internal characteristic of neural networks to retrieve the explanation tree. Thus, it can be theoretically employed to explain every type of classifier, i.e., it is model-agnostic.

Fig. 2.4
figure 4

Example of global tree-based explanation returned by trepan

In [16] is presented an extension of trepan that aims to keep the tree explanation simple and compact by introducing four splitting approaches aimed at finding the most relevant features during the tree construction. In [36], genetic programming is used to evolve a single decision tree that approximates the behavior of a neural network ensemble by considering additional genetic features obtained as combinations of the original data and the novel data annotated by the black box models. Both methods described in [16, 36] return explanations in the form of a global decision tree. The readers interested can refer to the papers for more details.

2.4.2 Rules List and Rules Set

A decision rule is generally formed by a set of conditions and by a consequent, e.g., if conditions, then consequent. Given a record, a decision rule assigns to the record the outcome specified in the consequent if the conditions are verified [2]. The most common rules are if-then rules that take into consideration rules with clauses in conjunction. On the other hand, for m-of-n rules given a set of n conditions, if m of them are verified, then the consequence of the rule is applied [51]. When a set of rules is used, then there are different strategies to select the outcome. For a list of rules, the order of the list is considered and the model returns the outcome of the first rule that verifies the conditions [76]. For instance, falling rule lists are if-then rules ordered with respect to the probability of a specific outcome [75]. On the other hand, decision sets are unordered lists of rules. Basically each rule is an independent classifier that can assign its outcome without regard for any other rule [39]. Voting strategies are used to select the final outcome.

List of rules and set of rules are adopted as explanation both from explanation methods and from transparent classifiers. In both cases, the reference context is tabular data. In [8], the explanation method rxren unveils with rules list the logic behind a trained neural network. First, rxren prunes the insignificant input neurons and identifies the data range necessary to classify the given test instance with a specific class. Second, rxren generates the classification rules for each class label exploiting the data ranges previously identified and improves the initial list of rules by a process that prunes and updates the rules. Figure 2.5 shows an example of rules returned by rxren. A survey on techniques extracting rules from neural networks is [4]. All the approaches in [4], including rxren, are model-specific explainers.

Fig. 2.5
figure 5

Example of the list of rules explanation returned by rxren

As previously mentioned, an alternative line of research to black box explanation is the design of transparent models for the AI systems. The corels method [5] is a technique for building rule lists for discretized tabular datasets. corels provides an optimal and certifiable solution in terms of rule lists. An example of rules list returned by corels is reported in Fig. 2.6. The rules are read one after the other, and the AI would take the decision of the first rule for which the conditions are verified. Decision sets are built by the method presented in [39]. The if-then rules extracted for each set are accurate, non-overlapping, and independent. Since each rule is independently applicable, decision sets are simple, succinct, and easily to be interpreted. A decision set is extracted by jointly maximizing the interpretability and predictive accuracy by means of a two-step approach using frequent itemset mining and a learning method to select the rules. The method proposed in [63] merges local explanation rules into a unique global weighted rule list by using a scoring system.

Fig. 2.6
figure 6

Example of the list of rules explanation returned by corels

2.4.3 Partial Dependency

Another global XAI method for inspecting the behavior of black box models is the partial dependence plot (pdp). In [32], the black box inspection problem is defined as providing a representation for understanding why the black box returns certain predictions more likely than others with particular inputs. The pdp visually shows the relationship between the AI decision and the input variables in a reduced feature space clarifying whether the relationship is linear, monotonic, or more complex.

In particular, a pdp shows the marginal effect of a feature on the AI decision [25]. Shortly, a feature is selected and it is varied in its domain. Then, instances are built with values of the selected feature and values from the other features of a given training data. The pdp for a value of the selected feature is the mean probability of classification over the training data or the average regression value. An assumption of the pdp is that the selected feature is not correlated with the other features. Generally, pdp approaches are model-agnostic and used on tabular datasets. In [38], the prospector method implementing a pdp is proposed to observe how the output of a black box varies by varying the input changing one variable at a time with an effective way to understand which are the most important features. Figure 2.7 shows the prospector pdp for the feature age and a black box that predicts the risk of churn. In this example, the prospector pdp shows the marginal effect (black line) of the feature Age on the predicted outcome Risk of a black box classifier. In particular, in this case, the higher is the Age, the higher is the probability of Risk of Churn. We highlight that for Age greater than fifty five this probability markedly increases.

Fig. 2.7
figure 7

Example of partial dependence plot

2.4.4 Local Rule-Based Explanation

Despite being useful, global explanations can be inaccurate because interpreting a whole model can be complex. On the other hand, even though the overall decision boundary is difficult to explain, locally, in the surrounding of a specific instance, it can be easier. Therefore, a local explanation rule can reveal the factual reasons for the decision taken by the black box of an AI system for a specific instance. The lore method is able to return a local rule-based explanation. lore builds a local decision tree in the neighborhood of the instance analyzed [30] generated with a genetic procedure to account for both similarity and differences with the instance analyzed. Then, it extracts from the tree a local rule revealing the reasons for the decision on the specific instance (see the green path in Fig. 2.8). For instance, the explanation of lore for the denied request of a loan from a customer with “age= 22, race= black, and income= 800” to a bank that uses an AI could be the factual rule if age25 and race =  black and income900 then deny. anchor [59] is another XAI approach for locally explaining black box models with decision rules called anchors. An anchor is a set of features with thresholds indicating the values that are fundamental for obtaining a certain decision of the AI. An anchor efficiently explores the black box behavior by generating random instances exploiting a multi-armed bandit formulation.

Fig. 2.8
figure 8

Example of local factual and counterfactual explanation returned by lore

2.4.5 Feature Importance

A widely adopted form of local explanation, especially for tabular data, consists of feature importance. Local explanations can also be returned in the form of feature importance that considers both the sign and the magnitude of the contribution of the features for a given AI decision. If the value of a feature is positive, then it contributes by increasing the model’s output; if the sign is negative, then the feature decreases the output of the model. If a feature has a higher contribution than another, then it means that it has a stronger influence on the prediction of the black box outcome. The feature importance summarizes the decision of the black box model providing the opportunity of quantifying the changes of the black box decision for each test record. Thus, it is possible to identify the features leading to a certain outcome for a specific instance and how much they contributed to the decision.

The lime model-agnostic local explanation method [58] randomly generates synthetic instances around the record analyzed and then returns the feature importance as the coefficient of a linear regression model adopted as a local surrogate. The synthetic instances are weighted according to their proximity to the instance of interest. The Lasso model is trained to approximate the probability of the decision of the black box in the synthetic neighborhood of the instance analyzed. Figure 2.9 shows the feature importance returned by lime (central part of the figure) toward the two classes. In this example, the feature odor=foul has a positive contribution of 0.26 in the prediction of a mushroom as poisonous, stack-surface-above-ring=silky has a positive contribution of 0.11, spore-print-color=chocolate has a positive contribution of 0.08, stack-surface-below-ring=silky has a positive contribution of 0.06, while gill-size=broad has a negative contribution of 0.13. Another widely adopted model-agnostic local explanation method is shap [45]. shap connects game theory with local explanations exploiting the Shapely values of a conditional expectation function of the black box to explain the AI. Shapley values are introduced in [64] with a method for assigning “payouts” to “players” depending on their contribution to the “total payout.” Players cooperate in a coalition and receive a certain “profit” from this cooperation. The connection with explainability is as follows. The “game” is the decision of the black box for a specific instance. The “profit” is the actual value of the decision for this instance minus the average values for all instances. The “players” are the feature values of the instance that leads toward a certain value, i.e., collaborate to receive the profit. Thus, a Shapley value is the average marginal contribution of a feature value across all possible coalitions, i.e., combinations [50]. Therefore, shap returns the local unique additive feature importance for each specific record. The higher is a Shapely value, and the higher is the contribution of the feature. Figure 2.10 illustrates an example of shap explanation, where the feature importance is expressed in the form of a force plot. This explanation shows for each feature the level of the contribution in pushing the black box prediction from the base value (the average model output over the dataset, which is 24.41 in this example) to the model output. The features pushing the prediction higher are shown in red; those pushing the prediction lower are shown in blue. Under appropriate settings, lime and shap can also be used to explain the decisions of AI working on textual data and images.

Fig. 2.9
figure 9

Example of explanation based on feature importance by lime

Fig. 2.10
figure 10

Example of explanation based on feature importance by shap

2.4.6 Saliency Maps

The most used type of explanation for explaining AI systems working on images consists of saliency maps. A saliency map is an image where each pixel’s color represents a value modeling the importance of that pixel for the prediction, i.e., they show the positive (or negative) contribution of each pixel to the black box outcome. Thus, saliency maps are returned by local explanation methods. In the literature, there exist different explanation methods locally explaining deep neural networks for image classification. The two most used model-specific techniques are gradient attribution methods like sal [67], grad [66], intg [69], elrp [9], and perturbation-based attribution methods [23, 77]. Without entering into the details, these XAI approaches aim to assign an importance score to each pixel such that the probability of the deep neural network of labeling the image with a different outcome is minimized, if only the most important pixels are considered. Indeed, the areas retrieved by these methods are also called attention areas.

The aforementioned XAI methods are specifically designed for specific DNN models, i.e., they are model-specific. However, under appropriate image transformations that exploit the concept of “superpixels” [58], the model-agnostic explanation methods such as lime, anchor, and lore can also be employed to explain AI working on images for any type of black box model. The attention areas of explanations returned by these methods are tied to the technique used for segmenting the image to explain and to a neighborhood consisting of unrealistic synthetic images with “suppressed” superpixels [29]. On the other hand, the local model-agnostic explanation method abele [31] exploits a generative model, i.e., an adversarial autoencoder[46], to produce a realistic synthetic neighborhood that allows retrieving more understandable saliency maps. Indeed, abele’s saliency maps highlight the contiguous attention areas that can be varied while maintaining the same classification from the black box used by the AI system. Figure 2.11 reports a comparison of saliency maps for the classification of the handwritten digits “9” and “0” for the explanation methods abele [31], lime [58], sal [67], grad [66], intg [69], and elrp [9].

Fig. 2.11
figure 11

Example of saliency maps returned by different explanation methods. The first column contains the image analyzed and the label assigned by the black box model b of the AI system

2.4.7 Prototype-Based Explanations

Prototype-based explanation methods return as explanation a selection of particular instances of the dataset for locally explaining the behavior of the AI system [50]. Prototypes (or exemplars) make clear to the user the reasons for the AI system’s decision. In other words, prototypes are used as a foundation of representation of a category, or a concept [26]. A concept is represented through a specific instance. Prototypes help humans in constructing mental models of the black box model and of the training data used. Prototype-based explainers are generally local methods that can be used independently for tabular data, images, and text. Obviously, prototype-based explanations only make sense if an instance of the data is humanly understandable and makes sense as an explanation. Hence, these methods are particularly useful for images, short texts, and tabular data with few features.

In [13], prototypes are selected as a minimal subset of samples from the training data that can serve as a condensed view of a dataset. Naive approaches for selecting prototypes use the closest neighbors from the training data with respect to a predefined distance function, or the closest centroids returned by a clustering algorithm [71]. In [43], we designed a sophisticated model-specific explanation method that directly encapsulates in a deep neural network architecture an autoencoder and a special prototype layer, where each unit of that layer stores a weight vector that resembles an encoded training input. The autoencoder permits to make comparisons within the latent space and to visualize the learned prototypes such that besides accuracy and reconstruction error, the optimization has a term that ensures that every encoded input is close to at least one prototype. Thus, the distances in the prototype layer are used for the classification such that each prediction comes with an explanation corresponding to the closest prototype. In [18], prototypical parts of images are extracted by a ProtoPNet network, such that each classification is driven by prototypical aspects of a class.

Although prototypes are representative of all the data, sometimes they are not enough to provide evidence for the classification without instances that are not well represented by the set of prototypes [50]. These instances are named criticisms and help to explain what is not captured by prototypes. In order to aid human interpretability, in [37], prototypes and criticisms are selected by means of the Maximum Mean Discrepancy (mmd): instances in highly dense areas are good prototypes, and instances that are in regions that are not well explained by the prototypes are selected as criticisms. Finally, the abele method [31] enforces the saliency map explanation with a set of exemplar and counter-exemplar images, i.e., images similar to the one under investigation classified for which the same decision is taken by the AI, and images similar to the one explained for which the black box of the AI returns a different decision. The particularity of abele is that it does not select exemplars and counter-exemplars from the training set, but it generates them synthetically exploiting an adversarial autoencoder used during the explanation process [40]. An example of exemplars (left) and counter-exemplars (right) is shown in Fig. 2.12.

Fig. 2.12
figure 12

Example of exemplars (left) and counter-exemplars (right) returned by abele. On top of each (counter-)exemplar is reported the label assigned by the black box model b of the AI system

2.4.8 Counterfactual Explanations

A counterfactual explanation shows what should have been different to change the decision of an AI system. Counterfactual explanations are becoming an essential component in XAI methods and applications [6] because they help people in reasoning on the cause–effect relations between analyzed instances and AI decision [17]. Indeed, while direct explanations such as feature importance, decision rules, and saliency maps are important for understanding the reasons for a certain decision, a counterfactual explanation reveals what should be different in a given instance to obtain an alternative decision [73]. Thinking in terms of “counterfactuals” requires the ability to figure a hypothetical causal situation that contradicts the observed one [50]. Thus, the “cause” of the situation under investigation are the features describing the situation and that “caused” a certain decision, while the “event” models the decision.

The most used types of counterfactual explanations are indeed prototype-based counterfactuals. In [74], counterfactual explanations are provided by an explanation method that solves an optimization problem that, given an instance under analysis, a training dataset, and a black box function, returns an instance similar to the input one and with minimum changes, i.e., minimum distance, but that reverts the black box outcome. The counterfactual explanation describes the smallest change that can be made in that particular case to obtain a different decision from the AI. In [72] is proposed the generation of diverse counterfactuals using mixed integer programming for linear models. As previously mentioned, abele [31] also returns synthetic counter-exemplar images that highlight the similarities and differences between images leading to the same decision and images leading to other decisions.

Another modeling for counterfactual explanations consists of the logical form that describes a causal situation like: “If X had not occurred, Y  would not have occurred” [50]. The local model-agnostic lore explanation method [30], besides a factual explanation rule, also provides a set of counterfactual rules that illustrate the logic used by the AI to obtain a different decision with minimum changes. For example, in Fig. 2.8, the set of counterfactual rules is highlighted in purple and shows that if income > 900 then grant, or if race =  white then grant, clarifying which particular changes would revert the decision. In [41] is proposed a local neighborhood generation method based on a Growing Spheres algorithm that can be used for both finding counterfactual instances and acting as a base for extracting counterfactual rules.

2.5 Conclusions

This chapter has discussed the problem of interpretability of AI-based decision systems that typically are opaque and hard to understand by humans. In particular, we have analyzed the different dimensions of the problem and the different types of explanations offered by methods proposed by the scientific community. The opportunity to explain complex AI-based systems is fundamental for the diffusion and adoption of those systems in critical domains. One of the most critical ones is the healthcare field where the question of interpretability is far from just intellectual curiosity. The point is that these systems should be used as a support for physicians who have important responsibilities when taking decisions that have a direct impact on health status of humans. For instance, a XAI system, providing details in the form of logical rules or feature importance, could be extremely useful to medical experts who have to monitor and predict the disease evolution of a patient (diabetes detection [70], Alzheimer progression [53], etc.) while understanding the reason for a specific evolution, progression, and complication. Exactly for studying progression and complication, prototype-based explanations and counterfactual explanations can play a crucial role. On the other hand, exemplars and counter-exemplars could be fundamental for identifying brain tumor by comparing with images from magnetic resonance scans [10] and for highlighting through saliency maps the areas of the brain responsible for the decision of the AI system. These are the only examples because there are many other different cases where the knowledge of the medical staff can be augmented by the knowledge acquired by the machine learning system able to elaborate and analyze myriad of the available information.

Another important field where explainability is applicable is in the context of recommendation systems for getting explainable e-commerce recommendations, explainable social recommendations, and explainable multimedia recommendations. In this context, the goal is to inscribe transparency in the systems but also to provide explanations to final users or system designers who are naturally involved in the loop. In e-commerce, the goal is to explain the ranking of specific recommendations of products [19, 35]. Explainable recommendations also apply to social networks for friend recommendations, recommendation of music, news, travels, tags in images, etc. A useful explanation for recommendation systems could be based on feature importance revealing which are the items contributing positively or negatively to the recommendation. Explainability in social environments is important to increase the users’ trustworthiness in the recommendations that is fundamental for the social network sustainability. For instance, in [33], a classifier for predicting the risk of car crash of a driver is equipped with the shap explainer that reveals the importance of the features recognizing the risk of collision. Understanding the reasons of recommendations is crucial because it makes the user aware about the technology he/she is using and also about his/her online behavior that enabled the specific recommendation.

Unveiling and interpreting the lending decisions made by an AI-based system is fundamental for the legal point of view and for increasing the social acceptance of these systems. Indeed, these systems based on machine learning models pick up biases from the training data. This can lead to learn possible discriminatory behavior against protected groups. In these contexts, interpretability can help in the debugging aimed at detecting those biases and to understand how to have a model able to minimize loan defaults, but also to avoid the discrimination due to certain demographics biases [22]. As a consequence, explainable AI in this setting has a double goal: providing clarification to end user about the reason of the final decisions and providing automated feedback to constantly improve the AI system to eliminate possible ethical issues.

The application domains just discussed are only some of the possible applications of explainable AI. With the AI research advancements, the need of explainability will tend to increase more and more because the complexity of the models could jeopardize their usability. Clearly, the research on explainable AI requires still some effort especially in terms of personalized and interactive explanations, i.e., in the study of methods able to provide explanations adaptable to the user background and enabling the human interaction creating the beneficial loop human-machine that could lead the machine to learn from humans and humans from machine.