Global Sensitivity Analysis for the Interpretation of Machine Learning Algorithms

Kuhnt, Sonja; Kalka, Arkadius

doi:10.1007/978-3-031-07155-3_6

Sonja Kuhnt³ &
Arkadius Kalka³

878 Accesses
3 Citations

Abstract

Global sensitivity analysis aims to quantify the importance of model input variables for a model response. We highlight the role sensitivity analysis can play in interpretable machine learning and provide a short survey on sensitivity analysis with a focus on global variance-based sensitivity measures like Sobol’ indices and Shapley values. We discuss the Monte Carlo estimation of various Sobol’ indices as well as their graphical presentation in the so-called FANOVA graphs. Global sensitivity analysis is applied to an analytical example, a Kriging model of a piston simulator and a neural net model of the resistance of yacht hulls.

Access provided by Autonomous University of Puebla. Download chapter PDF

Understanding the Uncertainty Using Sensitivity Analysis in Artificial Neural Networks

Simulation Optimization Through Regression or Kriging Metamodels

Response Surface Methodology

Keywords

1 Introduction

Machine learning is a set of methods that improve automatically through experience, i.e. it is based on data. Popular machine learning methods are, e.g. support vector machines (SVMs [2]), artificial neural networks (ANNs) and random forests (RFs). Machine learning algorithms are increasingly applied in science and business and have achieved impressive performances in diverse tasks, outperforming humans. However, for several machine learning algorithms it is hard to tell what the machine has actually learned from the data. For example, in the case of ANNs, what was learned is hidden in the weights and biases of the neurons involved. If a machine learning model performs well, one might simply trust the model and ignore why it made a certain decision. However, such an attitude goes against human curiosity and thirst for knowledge. This raises the issue of interpretability [24, 25]. The straightforward way to achieve interpretability in statistical learning is to use only interpretable models. Interpretable models are, e.g. linear models, generalized linear models [23], generalized additive models [12], decision trees and rules. On the other hand, model-agnostic interpretation methods are more flexible and can be applied to any machine learning algorithm. Graphical model-agnostic methods are, e.g. Partial Dependence Plots (PDP) and Individual Conditional Expectation (ICE) plots. We suggest to provide model-agnostic methods to evaluate the influence of different regressors and their interactions by applying methods from the statistical field of global sensitivity analysis (GSA). Sensitivity analysis is the study of how the uncertainty in the output of a mathematical model or function can be divided and allocated to different sources of uncertainty in its inputs [14, 29]. Given any real-valued function on several variables –whether analytical or given by a black-box– one wants to know which input variables affect the variability of the function the most. GSA has proven to be a valuable tool in analysing expensive to evaluate computer models with a surrogate model, e.g. a Kriging model, build first. Cheng et al. [3] use support vector regression as surrogate model within GSA, whereas [37] built a new feature selection approach upon GSA. Like in [4] we suggest to achieve an understanding and interpretability of, e.g. ANNs, SVMs and RFs by combining GSA and visualization.

2 Global Sensitivity Analysis

This section reviews sensitivity analysis with a focus on global variance-based sensitivity measures, but we also discuss derivative-based global sensitivity measures briefly.

2.1 Global Sensitivity Indices

Consider a function $f: \Delta \subseteq \mathbb {R}^d \rightarrow \mathbb {R}$ that is square integrable w.r.t. a d-dimensional product measure μ. The functional analysis of variance (FANOVA) decomposition (also called Hoeffding-Sobol’ decomposition) of f ∈ L ²(μ) is the unique decomposition

$$\displaystyle \begin{aligned} f(X)=f_0+ \sum_i f_i(X_i) + \sum_{i<j} f_{i,j}(X_i,X_j) + \cdots + f_{1,\ldots, d}(X_1, \ldots, X_d) \end{aligned} $$

(1)

such that E(f _I(X _I)∣X _J) = 0 for all J ⊂ I ⊆ [d] := {1, …, d}. In particular, we have E(f _I(X _I)) = 0 for all ∅≠I ⊆ [d], see e.g. [6]. Furthermore, this implies orthogonality of all summands in the decomposition, i.e. $E\left (f_I(X_I)f_J(X_J)\right )=0$ for all I≠J ⊆ [d]. The FANOVA decomposition can be computed recursively by

$$\displaystyle \begin{aligned} f_0=E(f(X)), \,\, \mathrm{and} \,\, f_I(X_I)=E(f(X)\mid X_I)- \sum_{J\subset I} f_J(X_J). \end{aligned} $$

(2)

Orthogonality allows for an ANOVA like decomposition of the variance of f:

$$\displaystyle \begin{aligned} D=Var(f(X))= \sum_{I\subseteq [d]} Var(f_I(X_I)). \end{aligned} $$

(3)

The variance of each term, D _I = V ar(f _I(X _I)), gives a sensitivity index of its effect. The standardized indices

$$\displaystyle \begin{aligned} S_I=D_I/D \end{aligned} $$

(4)

are known as Sobol’ indices [31]. Especially, Sobol’ indices S _i = S _{i} for individual variables are referred to as first-order indices and S _ij = S _{i,j} as second-order indices. The same holds for the unstandardized versions D _i and D _ij. Sobol’ introduced the closed sensitivity index to describe the influence of a group of variables:

$$\displaystyle \begin{aligned} D_I^{cl}= \sum_{J \subseteq I} D_J=Var(E(f(X)|X_I)). \end{aligned} $$

(5)

The total sensitivity index by Homma and Saltelli [13] describes the total contribution of a set of variables including all interactions of any order and is defined by all partial variances containing at least one of the variables, i.e.

$$\displaystyle \begin{aligned} D_I^T=\sum_{I \cap J\ne \emptyset} D_J, \quad S_I^T=\frac{D_I^T}{D}. \end{aligned} $$

(6)

For I = {i}, this total sensitivity index is defined by considering all supersets. An extension [21] of the concept of superset importance is given by

$$\displaystyle \begin{aligned} D_I^{sup}=\sum_{J \supseteq I} D_J. \end{aligned} $$

(7)

In particular, the unnormalized and normalized total interaction indices (TIIs) [9] are given by

$$\displaystyle \begin{aligned} D_{i,j}^{sup}=\sum_{J \supseteq \{i,j\}} D_J \quad \mbox{ and } \quad S_{i,j}^{sup}=\sum_{J \supseteq \{i,j\}} \frac{D_J}{D}. \end{aligned} $$

(8)

So, each of these indices characterizes a different aspect of the sensitivity of the model response to individual input variables or interactions between them.

2.2 Shapley Values

A similar problem to the FANOVA decomposition has been studied in game theory and economics, namely the problem of attributing the value created in a team effort to individual team members. Consider the setting where one can measure the value $val(I)\in \mathbb {R}$ created by any subset I ⊆ [d] of the d-member team. In that case the so-called Shapley values ϕ _i are the unique choice that satisfy the following four natural criteria [30, 36].

1.
(Efficiency) $\sum _i^d \phi _i=val([d])$.
2.
(Symmetry) val(I ∪ i) = val(I ∪ j) ∀I ⊆ [d] ∖{i, j} implies ϕ _i = ϕ _j.
3.
(Dummy) val(I ∪ i) = val(I) ∀I ⊆ [d] implies ϕ _i = 0.
4.
(Additivity) The game with value val ⁽¹⁾ + val ⁽²⁾ has Shapley values ϕ ⁽¹⁾ + ϕ ⁽²⁾ with ϕ ⁽¹⁾ = ϕ(val ⁽¹⁾) and ϕ ⁽²⁾ = ϕ(val ⁽²⁾).

Then the Shapley value of an individual variable is given by

$$\displaystyle \begin{aligned} \phi_i=\frac{1}{d} \sum_{I\subseteq [d]\setminus\{i\}} {d-1 \choose |I|}^{-1}(val(I \cup i)-val(I)). \end{aligned} $$

(9)

Shapley values are connected to the FANOVA decomposition by Owen [27]. In that context, for any subset I of input variables, their combined value val(I) is the “variance explained” in the FANOVA decomposition. More precisely, the choice in [27] is $val(I)=D_I^{cl}$. Then, using the properties (1) − (4), it can be shown that the Shapley value is

$$\displaystyle \begin{aligned} \phi_i=\sum_{I\subseteq [d], i\in I} \frac{D_I}{|I|}\end{aligned} $$

(10)

according to Theorem 1 in [27]. The Shapley value does not coincide with any first-order Sobol’ index, but it is bracketed between the closed and total sensitivity index [27]:

$$\displaystyle \begin{aligned} D_i^{cl} \le \phi_i \le D_i^T. \end{aligned} $$

(11)

A normalized Shapley value may be defined as $\phi ^*_i=\phi _i/D$. Because these indices are comparatively easy to compute, Sobol’ indices provide effectively computable bounds for the Shapley value. An exact computation of the Shapley value is computationally expensive because there are 2^d subsets of [d], representing coalitions of variables. Štrumbelj and Kononenko [34] and Song et al. [33] propose effective algorithms to estimate Shapley values using Monte Carlo sampling.

2.3 Derivative-Based Global Sensitivity Measures

Based on the work of [32] derivative-based global sensitivity measures (DGSM) were introduced by Kucherneko et al. [20] as

$$\displaystyle \begin{aligned} \nu_i= \int \left(\frac{\partial f}{\partial x_i}(x)\right)^2 d\mu. \end{aligned} $$

(12)

A normalized DGSM can be defined by $ \nu ^*_i=\nu _i/ \sum _j^d v_j. $ DGSMs are not associated with a functional decomposition, but they are connected to total sensitivity indices by the inequality $D_i^T\le C(\mu _i)\nu _i$ if for the measure μ the Poincare inequality

$$\displaystyle \begin{aligned} \int g(x)^2\mu \le C(\mu)\int ||\nabla g(x)||{}^2 d\mu \end{aligned} $$

(13)

holds for all centred functions g ∈ L ²(μ) with $\int g(x)d\mu =0$ and ||∇g||∈ L ²(μ). Friedman and Popescu [7] introduced crossed DSGMs, in particular, for interactions:

$$\displaystyle \begin{aligned} \nu_{i,j}= \int \left(\frac{\partial ^2f}{\partial x_i\partial _j}(x)\right)^2 d\mu. \end{aligned} $$

(14)

Roustant et al. [28] provide an inequality to link crossed DGSMs to superset importance.

2.4 Estimation of Indices

For analytically tractable test functions, the indices above may be calculated by evaluating the integrals involved. In general, the function f is not known analytically and will be treated as black-box function. In Monte Carlo estimation, we take a high number of n samples x ⁽¹⁾, …, x ⁽ⁿ⁾ from the distribution μ and approximate the integral by

$$\displaystyle \begin{aligned} \frac{1}{n} \sum_{k=1}^{n} f(x^{(k)}) \quad \stackrel{n\rightarrow \infty}{\longrightarrow} \int f(x)d\mu=E(f(X)). \end{aligned} $$

(15)

The approximation is unbiased and convergent with probability one according to the law of large numbers. For the estimation, we require a representation of the sensitivity indices that is suitable for Monte Carlo integration. A popular choice is based on the pick-and-freeze formula $D_I^{cl}=E(f(X)f(X_I,Z_{-I}))-f_0^2$ which gives the pick-and-freeze Monte Carlo estimator

$$\displaystyle \begin{aligned} \hat{D}_I^{cl}=\frac{1}{n} \sum_{k=1}^{n} f(x^{(k)})f\left(x^{(k)}_I,z^{(k)}_{-I}\right) -f_0^2. \end{aligned} $$

(16)

Here Z is an independent copy of the random variable X, and − I denotes the complement set [d] ∖ I. Since, the pick-and-freeze estimator gets a large variance if f ₀ is large, other formulas have been suggested that avoid the subtraction of $f_0^2$ [18, 31]. In particular, the total sensitivity index can be computed using the Jansen formula $D_I^T=1/2 E((f(X)-f(Z_I,X_{-I}))^2)$.

Computationally cheaper than Monte Carlo estimation, but also slightly biased, are frequency-based estimation methods. The first frequency-based estimation method was the so-called Fourier amplitude sensitivity test (FAST) by Cukier et al. [5]. TIIs can be easily estimated via the relationship with closed sensitivity indices using pick-and-freeze. Of particular interest is a direct method using the formula of [21]:

$$\displaystyle \begin{aligned} \begin{array}{rcl} D_{i,j}^{sup}& =&\displaystyle \frac{1}{4}E((f(X_i,X_j,X_{-\{i,j\}})-f(X_i,Z_j,X_{-\{i,j\}}) \end{array} \end{aligned} $$

(17)

$$\displaystyle \begin{aligned} \begin{array}{rcl} & &\displaystyle -f(Z_i,X_j,X_{-\{i,j\}}) +f(Z_i,Z_j,X_{-\{i,j\}}))^2). \end{array} \end{aligned} $$

(18)

The corresponding Liu-Owen Monte Carlo estimator is unbiased, and it is non-negative since it is a sum of squares. This implies that if the true TII is zero, then the estimator is zero as well.

3 Visualizing Interaction Structures by FANOVA Graphs

In this section the FANOVA graph, an intuitive tool to visualize the most valuable information of the FANOVA decomposition, is introduced [8, 10, 26]. Estimation and thresholding of FANOVA graphs is discussed, and we apply GSA to a standard non-linear test function.

3.1 General Idea of FANOVA Graphs

Usually, it is infeasible to look at all 2^d − 1 terms of the decomposition of a function with d input variables individually, and quite often only main effect Sobol’ indices are considered. The primal intention of FANOVA graphs is to overcome this problem and to visualize the interaction structure contained in the FANOVA decomposition by a mathematical graph [26]. The so-called FANOVA graph is defined as a graph G = (V, E) where each of the d input variables is identified by an element of the vertex set V = {1, …, d}. An edge is included in the edge set (i, j) ∈ E iff there exists a superset J ⊇{i, j} such that f _J(X _J)≠0. That is, the pair of input variables (X _i, X _j) has a non-zero two-way interaction or is involved in a higher order non-zero interaction. Equivalently an edge (i, j) is not in G iff all Sobol’ indices S _J = 0 for J ⊇{i, j}. This is exactly captured by a non-zero TII, i.e. $S_{i,j}^{sup}\neq 0$.

A FANOVA graph can be further enhanced by displaying the thicknesses of each edges (i, j) proportional to the strength of the TII of the two involved input variables. In the same way, each vertex i can be displayed by circles with lines proportional in strength to the main effect Sobol’ index S _i.

Let us consider the so-called Ishigami function which is frequently used for illustrating sensitivity analysis [16]. The function, given by

$$\displaystyle \begin{aligned} f(X_1,X_2,X_3)=\sin{}(X_1) +7\sin^2(X_2) +0.1X_3^4 \sin{}(X_1), \end{aligned} $$

(19)

depends on three input variables (X ₁, X ₂, X ₃) and obviously contains a non-linear interaction between X ₁ and X ₃ (see Fig. 1c). For this test function Sobol’ indices can be computed analytically. Assuming a uniform distribution on [−π, π] for each input variable, analytical calculation of the FANOVA decomposition and Sobol’ indices gives us the following values

$$\displaystyle \begin{aligned} D_1= 4.346, \, D_2 = 6.125, \, D_3 = 0, \, D_{12} = 0, \, D_{13} = 3.374, \, D_{23} = 0, \, D_{123} = 0.\end{aligned} $$

(20)

This leads to the following first-order Sobol’ indices and normalized TIIs

$$\displaystyle \begin{aligned}S_1=0.314, \, S_2 =0.442 , \, S_3 = 0, \, S_{12}^{sup} = 0, \, S_{13}^{sup} = 0.244, \, S_{23}^{sup} = 0.\end{aligned} $$

(21)

Figure 2 shows a bar plot and the FANOVA graph displaying the Sobol’ indices and TIIs for the Ishigami function. Main effect stands for the normalized first-order Sobol’ indices and interaction is the difference between the scaled total sensitivity index and the Sobol’ index. From the FANOVA graph it becomes immediately obvious that the input variable X ₂ has the highest impact on the response, followed by X ₁. Also, the interaction between X ₁ and X ₂ is easily detected.

In summary, FANOVA graphs visualize both first- and second-order GSA. First-order analysis in the sense of detecting the inputs X _i for which S _i is very small or even zero. Second-order in the sense of looking at pairs of input variables and detecting influential interactions and their strength, i.e. the pairs {i, j} with $S_{i,j}^{sup}> 0$.

3.2 Estimation and Thresholding

In practice, S _i and $S_{i,j}^{sup}$ are usually not analytically available and replaced by estimates $\hat {S}_{i,j}^{sup}$ and $\hat {S}_i$. Moreover, we often even apply GSA not to the actual black-box model but a meta-model or surrogate model of it. Then, estimates are typically not exactly equal to zero even if the “true” or analytically calculated sensitivity index would be. The resulting graph becomes confusing and uninformative. Therefore, edges (i, j) may be included into the graph only if

$$\displaystyle \begin{aligned} \hat{S}_{i,j}^{sup}>\delta \end{aligned} $$

(22)

for some small threshold δ, e.g. δ = 0.01 [26]. The computation of the FANOVA graph has been implemented in the R package fanovaGraph, providing several estimation methods as well as a thresholding functionality [8, 10].

To exemplify, let us now assume that we cannot analyse the Ishigami function analytically. Based on a random Latin hypercube design with 100 design points, we build a Kriging model of the Ishigami function. Our Kriging model has the usual Matern 5/2 covariance structure, no trend and no nugget effect. Table 1 shows the results for the estimators of the first-order Sobol’ indices $\hat {S}_i$, normalized Shapley values $\hat \phi _i^*$ and TIIs $\hat {S}_{i,j}^{sup}$ of the Kriging model. These estimators have been computed using the R packages fanovaGraph [10] and sensitivity [15]. Remember that the inequalities

$$\displaystyle \begin{aligned} S_i^{cl} \le \phi_i^* \le S_i^T \end{aligned} $$

(23)

hold and that $S_i^{cl}=S_i$, which is reflected by the order of the values in Table 1. Comparison also shows that the estimates slightly deviate from the true values given above.

Table 1 Sobol’ indices (first order) and TIIs for the Kriging model of the Ishigami function

Full size table

Figure 3 displays on the left hand side the resulting pure FANOVA graph. This is a complete graph as all estimated TIIs are different from zero, even if only slightly. Therefore, we threshold the values by δ = 0.025 and gain the graph on the right hand side, which is the same as for the analytical evaluation of the Ishigami function. The TII’s and the FANOVA graph help to discover an underlying block-additive structure of the function f, i.e. we can find a decomposition into cliques of input variables such that variables in different cliques do not interact. As outlined in [26], the detected interaction structure by the FANOVA graph can be a valuable aid in constructing block-additive Kriging models. Therefore, the fanovaGraph package also contains methods for block-additive Kriging analysis. The block-additive decomposition provided by the FANOVA graph can also be used in a parallelized global optimization procedure [17].

4 Fields of Applications

The interpretation of a machine learning model by global sensitivity indices and FANOVA graph is in general applicable to any kind of model with a continuous response variable. We show examples of a Kriging model of a piston simulator and an ANN of resistance of sailing yachts.

4.1 Kriging Model of a Piston Simulator

As an example for the application in the field of the design and analysis of computer experiments we are using the piston simulator from the mistat package in R [1], first presented in [19]. A piston is moving within a cylinder. The piston’s performance is measured by the time it takes to complete one cycle, in seconds. Here, we take the mean of 50 cycles as response, since the cycle time of the piston fluctuates strongly. The following factors can affect the piston’s performance. The ranges, in which these factors are varied uniformly in our sensitivity analysis, are given in brackets.

m	The impact pressure determined by the piston mass (30–60) [kg].
S	The piston surface area (0.005–0.020) [m ²].
V ₀	The initial volume of the gas inside the piston (0.002-0.010) [m ³].
k	The spring coefficient (1000–5000) [N∕m ³].
p ₀	The atmospheric pressure (9 ⋅ 10⁴ − 11 ⋅ 10⁴) [N∕m ²].
T	The surrounding ambient temperature (290–296) [K].
T ₀	The filling gas temperature (340–360) [K].

Based on a random Latin hypercube design with 70 design points, we build a Kriging model of the piston simulator. The Kriging model has a Matern 5/2 covariance kernel, no trend and no nugget effect. Table 2 shows the results for the Sobol’ indices (first-order and total) and the Shapley values of the piston simulator. The slightly negative value for, e.g. $\hat \phi ^*_6$ is of course an artefact of the estimation method. We observe that the piston surface X ₂ = S and the spring coefficient X ₄ = k have the largest effect on the cycle time.

Table 2 Sobol’ indices for the Kriging model of the piston simulator

Full size table

Figure 4 displays the FANOVA graph for the Kriging model of the piston simulator after thresholding by δ = 0.005.

In Fig. 4a both the edges as well as the vertices of the graph are presented by lines proportional to the values of the respective indices. It becomes obvious that X ₂ = S and X ₄ = k have the highest impact, followed by X ₁ = m. However, as the values of the TIIs are noticeably smaller than the first-order Sobol’ indices, it is not possible to detect which interactions are the largest. Therefore, in the FANOVA graph in Fig. 4b we only vary the edges in strength proportional to the values of the TIIs. The largest TII is observed for the interaction X ₂ X ₄ = Sk with $\hat {S}^{sup}_{2,4}=0.033$.

4.2 Neural Net Model of Resistance of Sailing Yachts

The residuary resistance of a ship is its total resistance minus the viscous resistance. In this section we are studying the residual resistance of sailing yachts in dependence of their hull geometry and the yacht velocity.

The Delft systematic yacht hull series data set [11] comprises 308 = 22 ⋅ 14 experiments with yacht models of scale 6.25 performed at the Delft Ship Hydromechanics Laboratory. In total, 22 different hull forms were tested with 14 different velocities. Based on the Delft series, semi-empirical models were developed [11] which are widely used in the yacht industry [22]. The Delft data set has 6 regressors and one dependent variable, all of which are dimensionless, i.e. their unit is 1 or % or ‰. Let the weight displacement Δ be the weight of water equivalent to the immersed volume of the hull. Then the dependent variable is the ratio R _r∕ Δ of the residuary resistance R _r to the weight displacement, given in ‰. The independent variables are as follows.

X ₁ :: The longitudinal centre of buoyancy (LCB) is the longitudinal distance, given in % of some characteristic length, from a point of reference (often midships) to the centre of the displaced volume of water.
X ₂ :: The prismatic coefficient C _p = ∇∕L _WL A _m with A _m the cross-sectional area of the underwater slice at midships. C _p displays the ratio of the immersed volume of the hull to a volume of a prism with equal length and cross-sectional area A _m.
X ₃ :: The length-displacement ratio L _WL∕∇^1∕3 where the volume displacement ∇ is the volume of water displaced by the hull.
X ₄ :: The beam-draught ratio B _WL∕T where the draught T is the maximal distance from the water line to the bottom of the keel.
X ₅ :: The length-beam ratio L _WL∕B _WL is the ratio of length to maximal width at water line.
X ₆ :: The Froude number $Fr=u/\sqrt {gL_{WL}}$. Here u is the flow velocity relative to the yacht, g the gravitational acceleration, and L _WL is the length of the hull at water line.

We train a single hidden layer ANN to learn the relationship between input and output variables in the Delft data set. Such ANNs are implemented in the nnet package in R [35]. For regression, we choose an ANN with a linear activation function to the output neuron. The data set is divided into training, validation and testing subsets, containing 50%, 25% and 25% of the samples, respectively. The hyperparameter to be tuned is the number n of neurons in the hidden layer. We choose the ANN with lowest validation error, i.e. highest R ²-value for the validation data. That is according to Table 3 an ANN with 8 hidden neurons, i.e. a 6-8-1 net with 6 ⋅ 8 + 8 = 56 weights and 8 + 1 = 9 biases.

Table 3 R ² values for trainings, validation and test data for ANNs with different number of hidden neurons

Full size table

For the chosen ANN as black-box function we perform a GSA. We compute the Sobol’ indices as well as the scaled TIIs using the Liu-Owen method with n = 100, 000 Monte Carlo samples with the help of the fanovaGraph package in R. Table 4 displays these sensitivity indices with the following coding of the regressors X ₁ = LCB, X ₂ = C _p, X ₃ = L _WL∕∇^1∕3, X ₄ = B _WL∕T, X ₅ = L _WL∕B _WL and X ₆ = Fr. The Sobol’ indices and scaled TIIs are graphically displayed in the bar plot in Fig. 5a.

Table 4 Sobol’ and Shapley values for the neural net model

Full size table

Figure 5 displays the FANOVA graph for the ANN model with and without thresholding. The Froude number, a proxy for velocity, has by far the largest impact on the residuary resistance. The largest interactions are X ₁ X ₄, X ₄ X ₆ and X ₁ X ₆.

5 Summary

We have discussed the usefulness of GSA as a tool for interpretable machine learning. Global sensitivity indices based on Sobol’ indices, Shapley values as well as derivative-based global sensitivity measures are revisited. FANOVA graphs allow for a very intuitive visualization of interaction structures and the strength of first-order Sobol’ indices and TIIs. The approach is exemplified with a Kriging meta-model for a piston simulator and an ANN model for the resistance of yachts.

References

Amberti, D.: mistat: Data Sets, Functions and Examples from the Book: “Modern Industrial Statistics” by Kenett, Zacks and Amberti (2018). https://CRAN.R-project.org/package=mistat. R package version 1.0-5
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Haussler, D. (ed.) Proceedings of the Fifth Annual Workshop on Computational Learning Theory – COLT ’92, pp. 144–152. ACM Press, New York (1992). https://doi.org/10.1145/130385.130401
Chapter Google Scholar
Cheng, K., Lu, Z., Zhou, Y., Shi, Y., Wei, Y.: Global sensitivity analysis using support vector regression. Appl. Math. Model. 49(4), 587–598 (2017). https://doi.org/10.1016/j.apm.2017.05.026
Article MathSciNet MATH Google Scholar
Cortez, P., Embrechts, M.J.: Using sensitivity analysis and visualization techniques to open black box data mining models. Inf. Sci. 225(1), 1–17 (2013). https://doi.org/10.1016/j.ins.2012.10.039
Article Google Scholar
Cukier, R., Levine, H., Shuler, K.: Nonlinear sensitivity analysis of multiparameter model systems. J. Comput. Phys. 26(1), 1–42 (1978). https://doi.org/10.1016/0021-9991(78)90097-9
Article MathSciNet MATH Google Scholar
Efron, B., Stein, C.: The jackknife estimate of variance. Ann. Stat. 9(3) (1981). https://doi.org/10.1214/AOS/1176345462
Friedman, J.H., Popescu, B.E.: Predictive learning via rule ensembles. Ann. Appl. Stat. 2(3), 916–954 (2008). https://doi.org/10.1214/07-AOAS148
Article MathSciNet MATH Google Scholar
Fruth, J., Roustant, O., Muehlenstaedt, T.: The fanovaGraph Package: Visualization of Interaction Structures and Construction of Block-additive Kriging Models (2013). https://hal.archives-ouvertes.fr/hal-00795229
Fruth, J., Roustant, O., Kuhnt, S.: Total interaction index: a variance-based sensitivity index for second-order interaction screening. J. Stat. Plann. Infer. 147, 212–223 (2014). https://doi.org/10.1016/j.jspi.2013.11.007
Article MathSciNet MATH Google Scholar
Fruth, J., Muehlenstaedt, T., Roustant, O., Jastrow, M., Kuhnt, S.: fanovaGraph: building Kriging Models from FANOVA Graphs (2020). https://CRAN.R-project.org/package=fanovaGraph. R package version 1.5
Gerritsma, J., Onnink, R., Versluis, A.: Geometry, resistance and stability of the delft systematic yacht hull series. Int. Shipbuild. Prog. 28(328), 276–297 (1981). https://doi.org/10.3233/ISP-1981-2832801
Article Google Scholar
Hastie, T., Tibshirani, R.J.: Generalised additive models. In: Monographs on Statistics and Applied Probability, vol. 43. Chapman and Hall, London (1990)
Google Scholar
Homma, T., Saltelli, A.: Importance measures in global sensitivity analysis of nonlinear models. Reliab. Eng. Syst. Saf. 52(1), 1–17 (1996). https://doi.org/10.1016/0951-8320(96)00002-6
Article Google Scholar
Iooss, B., Lemaître, P.: A review on global sensitivity analysis methods. In: Dellino, G., Meloni, C. (eds.) Uncertainty Management in Simulation-Optimization of Complex Systems, Operations Research/Computer Science Interfaces Series, vol. 59, pp. 101–122. Springer US, Boston (2015). https://doi.org/10.1007/978-1-4899-7547-8_5
Google Scholar
Iooss, B., Veiga, S.D., Janon, A., Pujol, G., with contributions from Baptiste Broto, Boumhaout, K., Delage, T., Amri, R.E., Fruth, J., Gilquin, L., Guillaume, J., Le Gratiet, L., Lemaitre, P., Marrel, A., Meynaoui, A., Nelson, B.L., Monari, F., Oomen, R., Rakovec, O., Ramos, B., Roustant, O., Song, E., Staum, J., Sueur, R., Touati, T., Weber, F.: Sensitivity: global Sensitivity Analysis of Model Outputs (2020). https://CRAN.R-project.org/package=sensitivity. R package version 1.22.1
Ishigami, T., Homma, T.: An importance quantification technique in uncertainty analysis for computer models. In: 1990 Proceedings of First International Symposium on Uncertainty Modeling and Analysis, pp. 398–403. IEEE Computer Society Press, Washington (1990). https://doi.org/10.1109/ISUMA.1990.151285
Ivanov, M., Kuhnt, S.: A parallel optimization algorithm based on FANOVA decomposition. Qual. Reliab. Eng. Int. 30(7), 961–974 (2014). https://doi.org/10.1002/qre.1710
Article Google Scholar
Jansen, M.J.: Analysis of variance designs for model output. Comput. Phys. Commun. 117(1–2), 35–43 (1999). https://doi.org/10.1016/S0010-4655(98)00154-4
Article MATH Google Scholar
Kenett, R., Zacks, S., Amberti, D.: Modern Industrial Statistics: With Applications in R, MINITAB and JMP, 2nd edn. Statistics in Practice. Wiley, Chichester (2014)
Google Scholar
Kucherenko, S., Rodriguez-Fernandez, M., Pantelides, C., Shah, N.: Monte Carlo evaluation of derivative-based global sensitivity measures. Reliab. Eng. Syst. Saf. 94(7), 1135–1148 (2009). https://doi.org/10.1016/j.ress.2008.05.006
Article Google Scholar
Liu, R., Owen, A.B.: Estimating mean dimensionality of analysis of variance decompositions. J. Am. Stat. Assoc. 101(474), 712–721 (2006). https://doi.org/10.1198/016214505000001410
Article MathSciNet MATH Google Scholar
Lopez Gonzalez, R.: Neural networks for variational problems in engineering. PhD Thesis. Technical University of Catalonia (2008)
Google Scholar
McCullagh, P., Nelder, J.A.: Generalized linear models. Monographs on Statistics and Applied Probability, vol. 37, 2nd edn. Chapman and Hall, London (1989)
Google Scholar
Molnar, C.: Interpretable Machine Learning. lulu.com (2020)
Google Scholar
Molnar, C., Casalicchio, G., Bischl, B.: Quantifying model complexity via functional decomposition for better post-hoc interpretability. In: Cellier, P., Driessens, K. (eds.) Machine Learning and Knowledge Discovery in Databases. Communications in Computer and Information Science, vol. 1167, pp. 193–204. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-030-43823-4_17
Google Scholar
Muehlenstaedt, T., Roustant, O., Carraro, L., Kuhnt, S.: Data-driven kriging models based on Fanova-decomposition. Stat. Comput. 22(3), 723–738 (2012). https://doi.org/10.1007/s11222-011-9259-7
Article MathSciNet MATH Google Scholar
Owen, A.B.: Sobol’ indices and Shapley value. SIAM/ASA J. Uncertain. Quantif. 2(1), 245–251 (2014). https://doi.org/10.1137/130936233
Article MathSciNet MATH Google Scholar
Roustant, O., Fruth, J., Iooss, B., Kuhnt, S.: Crossed-derivative based sensitivity measures for interaction screening. Math. Comput. Simul. 105, 105–118 (2014). https://doi.org/10.1016/j.matcom.2014.05.005
Article MathSciNet MATH Google Scholar
Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Saisana, M., Tarantola, S.: Global Sensitivity Analysis. The Primer. John Wiley & Sons, Ltd, Chichester (2007). https://doi.org/10.1002/9780470725184
Shapley, L.S.: A value for n-person games. In: Kuhn, H.W., Tucker, A.W. (eds.) Contributions to the Theory of Games (AM-28), vol. II, pp. 307–318. Princeton University Press, Princeton (1953). https://doi.org/10.1515/9781400881970-018
Google Scholar
Sobol, I.M.: Sensitivity analysis for non-linear mathematical models. Math. Modeling Comput. Experiment 1(4), 407–414 (1993)
MathSciNet MATH Google Scholar
Sobol, I., Gershman, A.: On an alternative global sensitivity estimators. In: Proceedings of SAMO 1995, Belgirate, pp. 40–42 (1995)
Google Scholar
Song, E., Nelson, B.L., Staum, J.: Shapley effects for global sensitivity analysis: theory and computation. SIAM/ASA J. Uncertain. Quantif. 4(1), 1060–1083 (2016). https://doi.org/10.1137/15M1048070
Article MathSciNet MATH Google Scholar
Štrumbelj, E., Kononenko, I.: Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 41(3), 647–665 (2014). https://doi.org/10.1007/s10115-013-0679-x
Article Google Scholar
Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S, 4th edn. Springer, New York (2002). ISBN 0-387-95457-0. http://www.stats.ox.ac.uk/pub/MASS4
Book MATH Google Scholar
Winter, E.: The Shapley value. In: Handbook of Game Theory with Economic Applications, vol. 3, pp. 2025–2054. Elsevier, Amsterdam (2002). https://doi.org/10.1016/S1574-0005(02)03016-3
Zhang, P.: A novel feature selection method based on global sensitivity analysis with application in machine learning-based prediction model. Appl. Soft Comput. 85, 105859 (2019). https://doi.org/10.1016/j.asoc.2019.105859
Article Google Scholar

Download references

Acknowledgements

The financial support of the Deutsche Forschungsgemeinschaft (SFB 823, project B1) is gratefully acknowledged.

Author information

Authors and Affiliations

University of Applied Sciences and Arts, Dortmund, Germany
Sonja Kuhnt & Arkadius Kalka

Authors

Sonja Kuhnt
View author publications
You can also search for this author in PubMed Google Scholar
Arkadius Kalka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sonja Kuhnt .

Editor information

Editors and Affiliations

Institute of Statistics and AI Center, RWTH Aachen University, Aachen, Germany
Ansgar Steland
Grado Department of Industrial and Systems Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
Kwok-Leung Tsui

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kuhnt, S., Kalka, A. (2022). Global Sensitivity Analysis for the Interpretation of Machine Learning Algorithms. In: Steland, A., Tsui, KL. (eds) Artificial Intelligence, Big Data and Data Science in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-031-07155-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-07155-3_6
Published: 15 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-07154-6
Online ISBN: 978-3-031-07155-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Global Sensitivity Analysis for the Interpretation of Machine Learning Algorithms

Abstract

Similar content being viewed by others

Understanding the Uncertainty Using Sensitivity Analysis in Artificial Neural Networks