Explainable Artificial Intelligence Model: Analysis of Neural Network Parameters

Pal, Sandip Kumar; Bhave, Amol A.; Banerjee, Kingshuk

doi:10.1007/978-981-33-6656-5_4

Sandip Kumar Pal²,
Amol A. Bhave² &
Kingshuk Banerjee³

Part of the book series: Springer Proceedings in Business and Economics ((SPBE))

Abstract

In recent years, artificial neural network is becoming a popular technology to extract the extremely complex pattern in the data across different segments of research areas and industrial applications. Most of the artificial intelligence researchers are now focused to build smart and user-friendly applications which can assist humans to make the appropriate decision in the business. The aim to build these applications is mainly to reduce the human errors and minimize influence of individual perceptions in the decision-making process. There is no doubt that this technology will be able to lead to a world where we can enjoy AI-driven applications for our day-to-day life and making some important decisions more accurately. But what if we want to know the explanation and reason behind the decision of AI system. What if we want to understand the most important factors of the decision-making processes of such applications. Due to the intense complexity of inherent structure of AI algorithm, usually researchers define the artificial neural network as “black box” whereas the traditional statistical learning models are more transparent, interpretable and explainable with respect to data and underlying business hypothesis. In this article, we will present TRAnsparent Neural Network (TRANN) by examining and explaining the network structure (model size) using statistical methods. Our aim is to create a framework to derive the right size and relevant connections of network which can explain the data and address the business queries. In this paper, we will be restricting us to analyse the feed-forward neural network model through nonlinear regression model and analyse the parameter properties guided by statistical distribution, information theoretic criteria and simulation technique.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Review of Research in the Field of Developing Methods to Extract Rules From Artificial Neural Networks

Article 01 November 2021

An object-oriented neural representation and its implication towards explainable AI

Article 14 September 2023

LioNets: a neural-specific local interpretation technique exploiting penultimate layer information

Article 10 May 2022

Keywords

1 Introduction

In recent years, there has been growing interest of extracting patterns from data using artificial neural network (ANN)-based modelling techniques. The use of these models in the real-life scenarios is becoming primary focus area across different industries and data analytics practitioners. It is already established that the ANN-based models provide a flexible framework to build the models with increased predictive performance for the large and complex data. But unfortunately, due to high degree of complexity of ANN models, the interpretability of the results can be significantly reduced, and it has been named as “black box” in this community. For example, in banking system to detect the fraud or a robo-advisor for securities consulting or for opening a new account in compliance with the KYC method, there are no mechanisms in place which make the results understandable. The risk with this type of complex computing machines is that customers or bank employees are left with a series of questions after a consultancy or decision which the banks themselves cannot answer: “Why did you recommend this share?”, “Why was this person rejected as a customer?”, “How does the machine classify this transaction as terror financing or money laundering?”. Naturally, industries are more and more focusing on the transparency and understanding of AI when deploying artificial intelligence and complex learning systems.

Probably, this has opened a new direction of research works to develop various approaches to understand the model behaviour and the explainability of the model structure. Recently, Joel et al. (2018) has developed explainable neural network model based on additive index models to learn interpretable network connectivity. But it is not still enough to understand the significance of the features used in the model and the model is well specified or not.

In this article, we will express the neural network (NN) model as nonlinear regression model and use statistical measures to interpret the model parameters and the model specification based on certain assumptions. We will consider only multilayer perceptron (MLP) networks which is a very flexible class of statistical procedures. We have arranged this article as: (a) explain the structure of MLP as feed-forward neural network in terms of nonlinear regression model, (b) the estimation of the parameters, (c) properties of parameters and their asymptotic distribution, (d) simulation study and conclusion.

2 Transparent Neural Network Model (TRANN)

In this article, we have considered the MLP structure given in Fig. 1. Each neural network can be expressed as a function of explaining variable $X=\left[ x_1,x_2,\ldots ,x_p\right] $ and the network weights $\omega =\left( \gamma ^{\prime },\beta ^\prime ,b^\prime \right) $ where $\alpha ^\prime $ is the weights between input and hidden layers, $\beta ^\prime $ is the weights between hidden and output layers and $b^\prime $ is the bias of the network. This network is having the following functional form

$$\begin{aligned} F(X,\omega ) = \sum _{h=1}^{H}\beta _{h}g(\sum _{i=1}^{I}\gamma _{hi}x_{i}+b_{h})+b_{00} \end{aligned}$$

(1)

where the scalars I and H denote the number of input and hidden layers of the network and g is a nonlinear transfer function. The transfer function g can be considered as either logistic function or the hyperbolic tangent function. In this paper, we have considered logistic transfer function for all the calculation. Let us assume that Y is dependent variable and we can write Y as a nonlinear regression form

$$\begin{aligned} Y=F(X,\omega ) + \epsilon \end{aligned}$$

(2)

where $\epsilon $ is $ {i.i.d}$ normal distribution with E$[\epsilon ]=0$, E$[\epsilon \epsilon ^{\prime }]=\sigma I$. Now, Eq. (2) can be interpreted as parametric nonlinear regression of Y on X. So based on the given data, we will be able to estimate all the network parameters. Now the most important question is what would be the right architecture of the network, how we can identify the number of hidden units in the network and how to measure the importance of those parameters. The aim is always to identify an optimum network with small number of hidden units which can well approximate the unknown function (Sarle 1995). Therefore, it is important to derive a methodology not only to select an appropriate network but also to explain the network well for a given problem.

In the network literature, available and pursued approaches are regularization, stopped-training and pruning (Reed 1993). In regularization methods, we can minimize the network error (e.g. sum of error square) along with a penalty term to choose the network weights. In the stopped-training data set, the training data set split into training and validation data set. The training algorithm is stopped when the model errors in the validation set begin to grow during the training of the network, basically stopping the estimation when the model is overparameterized or overfitted. It may not be seen as sensible estimates of the parameters as the growing validation error would be an indication to reduce the network complexity. In the pruning method, the network parameters are chosen based on the “significant” contribution to the overall network performance. However, the “significance” is not judged by based on any theoretical construct but more like a measure of a factor of importance.

The main issue with regularization, stopped-training and pruning is that they are highly judgemental in nature which makes the model building process difficult to reconstruct. In transparent neural network (TRANN), we are going to explain the statistical construct of the parameters’ estimation and their properties through which we explain the statistical importance of the network weights and will address well the model misspecification problem. In the next section, we will describe the statistical concept to estimate the network parameters and their properties. We have done a simulation study to justify our claim.

3 TraNN Parameter Estimation

In general, the estimation of parameters of a nonlinear regression model cannot be determined analytically and needs to apply the numerical procedures to find the optima of the nonlinear functions. This is a standard problem in numerical mathematics. In order to estimate the parameters, we minimized squared error, $SE=\sum _{t=1}^{T}(Y_{t}-F(X_{t},\omega ))^2$, and applied backpropagation method to estimate the parameters. Backpropagation is the most widely used algorithm for supervised learning with multi-layered feed-forward networks. The repeated application of chain rule has been used to compute the influence of each weight in the network with respect to an error function SE in the backpropagation algorithm (Rumelhart et al. 1986) as:

$$\begin{aligned} \frac{\partial {SE}}{\partial \omega _{ij}}=\frac{\partial {SE}}{\partial {s_{i}}}\frac{\partial {s_{i}}}{\partial {\text {net}_{i}}}\frac{\partial {\text {net}_{i}}}{\partial {\omega _{ij}}} \end{aligned}$$

(3)

where $\omega _{ij}$ is the weight from neuron j to neuron i, $s_{i}$ is the output, and net$_{i}$ is the weighted sum of the inputs of neuron i. Once the partial derivatives of each weight are known, then minimizing the error function can be achieved by performing

$$\begin{aligned} \check{\omega }_{t+1}= \check{\omega }_{t}-\eta _{t}[- \nabla F(X_{t},\check{\omega }_{t})]^{\prime }[Y_{t}-F(X,\check{\omega }_{t})], t=1,2,\ldots ,T \end{aligned}$$

(4)

Based on the assumptions of the nonlinear regression model (2) and under some regularity conditions for F, it can be proven (White 1989) that the parameter estimator $\hat{\omega }$ is consistent with asymptotic normal distribution. White ((White, 1989)) had shown that the parameter estimator an asymptotically equivalent estimator can be obtained from the backpropagation estimator using Eq. (4) when $\eta _{t}$ is proportional to $t^{-1}$ as

$$\begin{aligned} \hat{\omega }_{t+1}&= \check{\omega }_{t}+\left[ \sum _{t=1}^{T}\nabla F(X_{t},\check{\omega }_{t})^{'}\nabla F(X_{t},\check{\omega }_{t})\right] ^{-1}\\ \nonumber&\quad \times \sum _{t=1}^{T}\nabla F(X_{t},\check{\omega }_{t})^{'}[Y_{t}-F(X,\check{\omega }_{t})], t=1,2,\ldots ,T \end{aligned}$$

(5)

In that case, the usual hypothesis test like Wald test or the LM test for nonlinear models can be applied. Neural network belongs to the class of misspecified models as it does not map to the unknown function exactly but approximates. The application of asymptotic standard test is still valid as the misspecification can be taken care through covariance matrix calculation of the parameters (White 1994). The estimated parameters $\hat{\omega }$ are normally distributed with mean $\omega ^{*}$ and covariance matrix $\frac{1}{T}C$. The parameter vector $\omega ^{*}$ can be considered as best projection of the misspecified model onto the true model which lead to:

$$\begin{aligned} \sqrt{T}(\hat{\omega }-\omega ^{*})\sim N(0,C) \end{aligned}$$

(6)

where the T denotes the number of observations. As per the theory of misspecified model (Anders 2002), the covariance matrix can be calculated as

$$\begin{aligned} \frac{1}{T} = A^{-1}BA^{-1} \end{aligned}$$

(7)

where the matrix A and B can be expressed as $A \equiv E[\nabla ^{2} SE_{t}]$ and $B \equiv E[\nabla SE_{t}\nabla SE_{t}^{'}]$. $SE_{t}$ denotes the squared error contribution of the tth observations, and $\nabla $ is the gradient with respect to the weights.

4 TRANN Model Parameter Test for Significance

The hypothesis tests for significance of the parameters are an instrument for any statistical models. In TRANN, we are finding and eliminating redundant inputs from the feed-forward single layered network through statistical test of significance. This will help to understand the network well and will be able to explain to network connection with mathematical evidence. This will help to provide a transparency to the model as well. The case of irrelevant hidden units occurs when identical optimal network performance can be achieved with fewer hidden units. For any regression method, the value of t-statistic plays an important role for hypothesis testing whereas it is overlooked in neural networks. The non-significant parameters can be removed from the network, and the network can be uniquely defined (White 1989). This is valid for linear regression as well as neural networks. Here, we estimate the t-statistic as

$$\begin{aligned} {\frac{\hat{\omega }_{k}-\omega _{H_{0}}(k)}{\hat{\sigma }_{k}}} \end{aligned}$$

(8)

where $\omega _{H_{0}}(k)$ denotes the value or the restrictions to be tested under null hypothesis $H_{0}$. The $\hat{\sigma }_{k}$ is the estimated standard deviation of the estimated parameter $\hat{\omega }_{k}$. Later, we have estimated the variance–covariance matrix $\hat{C}$ where the diagonal elements are $\omega _{k}$ and the $\hat{C}$ can be estimated as

$$\begin{aligned} \frac{1}{T}\hat{C} = \hat{A}^{-1}\hat{B}\hat{A}^{-1} \end{aligned}$$

(9)

$$\begin{aligned} \hat{A}^{-1}= \frac{1}{T} \sum _{t=1}^{T}\frac{\partial ^{2}SE_{t}}{\partial \hat{\omega }\partial \hat{\omega }^{'}} \text{ and } \hat{B}^{-1} = \sum _{t=1}^{T}\hat{\epsilon }_{t}^{2}(\frac{\partial F(t,\hat{\omega })}{\partial \hat{\omega }})(\frac{\partial F(t,\hat{\omega })}{\partial \hat{\omega }})^{'} \end{aligned}$$

(10)

where $\hat{\epsilon }_{t}^{2}$ is the square of estimated error for tth sample.

Equation (6) implies that asymptotic distribution of the network parameters is normally distributed and it is possible to perform the test of significance of each parameter using the estimated covariance matrix $\hat{C}$. Then, both Wald test and LM test are applicable as per the theory of misspecified model (Anders 2002).

5 Simulation Study

We have performed a simulation study to establish the estimation methods and hypothesis test of significance with a 8-2-1 feed-forward network where we have considered eight input variables, one hidden layer with two hidden units and one output layer. Therefore, as per the structure of Eq. (1), the network model contains 21 parameters and we have set the parameter values as $b^{\prime } = (b_{00}: 0.91, b_{1}: -0.276, b_{2}: 0.276)$

$\beta ^{\prime } = (\beta _{1}: 0.942, \beta _{2}: 0.284)$

$\gamma ^{\prime } = (\gamma _{11}= -1.8567, \gamma _{21} = -0.0185, \gamma _{31}= -0.135), \gamma _{41}= 0.743, \gamma _{51}= 0.954, \gamma _{61}= 1.38, \gamma _{71}= 1.67, \gamma _{81}= 0.512, \gamma _{12}= 1.8567, \gamma _{22}= 0.0185, \gamma _{32}= 0.135, \gamma _{42}= -0.743, \gamma _{52}= -0.954, \gamma _{62}= -1.38, \gamma _{72}= -1.67, \gamma _{82}= -0.512)$

and the error term $\epsilon $ is generated from normal distribution with mean zero and standard deviation 0.001. In the model, the independent variables $X = [x_{1}, . . . , x_{8}]$ are drawn from exponential distribution. We have generated 100,000 samples using the above parameters, and then we have taken multiple sets of 5000 random sample of observations out of 100,000 observations and derived the estimates of the parameters and confidence intervals. We are calling this method as bootstrap method. The estimated values of the parameters, standard errors, confidence interval, t-values and p-values through bootstrapping method are given in Table 1. The results based on the asymptotic properties of the estimates are given in Table 2 based on Eq. (9). Both the methods are establishing the test of significance of parameters under null hypothesis $H_{0} : \omega =0$.

Table 1 Results using bootstrapping method

Full size table

Table 2 Results using asymptotic properties

Full size table

6 Conclusion

Neural networks are a very flexible class of assumptions about the structural form of the unknown function F. In this paper, we have used nonlinear regression technique to explain the network through statistical analysis. The statistical procedures usable for model building in neural networks are significance test of parameters through which an optimal network architecture can be established. In our opinion, the transparent neural network is a major requirement to perform a diagnosis of neural network architecture which not only approximates the unknown function but also explains the network features well through the statistical nonlinear modelling assumptions. As a next step, we would like to investigate more on the deep neural networks based on the similar concepts.

References

Anders, U. (2002). Statistical model building for neural networks. In 963 Statistical Model Building for Neural Networks.
Google Scholar
Joel, V., et al. (2018). Explainable neural networks based on additive index model. arXiv.
Google Scholar
Reed, R. (1993). Pruning algorithms—A survey. IEEE Transactions on Neural Networks, 4, 740–747.
Article Google Scholar
Rumelhart, D. E., et al. (1986). A direct adaptive method for faster backpropagation learning-the rprop algorithm. Parallel distributed Processing.
Google Scholar
Sarle, W. S. (1995). Stopped training and other remedies for overfitting. In Proceedings of the 27th Symposium on the Interface.
Google Scholar
White, H. (1994). Estimation, inference and specification analysis. Cambridge University Press.
Google Scholar
White, H. (1989). Learning in neural networks: A statistical perspective. Neural Computation, 1, 425–464.
Article Google Scholar

Download references

Acknowledgements

We use this opportunity to express our gratitude to everyone who supported us in this work. We are thankful for their intellectual guidance, invaluable constructive criticism and friendly advice during this project work. We are sincerely grateful to them for sharing their truthful and illuminating views on a number of issues related to the project. We express our warm thanks to our colleagues Koushik Khan and Sachin Verma for their support to write code in Python and R. We would also like to thank Prof. Debasis Kundu from IIT Kanpur who provided the valuable references and suggestions for this work.

Author information

Authors and Affiliations

Cognitive Business and Decision Support, IBM India, Bengaluru, India
Sandip Kumar Pal & Amol A. Bhave
Cognitive Computing and Analytics, IBM Global Business Services, Bengaluru, India
Kingshuk Banerjee

Authors

Sandip Kumar Pal
View author publications
You can also search for this author in PubMed Google Scholar
Amol A. Bhave
View author publications
You can also search for this author in PubMed Google Scholar
Kingshuk Banerjee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sandip Kumar Pal .

Editor information

Editors and Affiliations

Production and Quantitative Methods, Indian Institute of Management Ahmedabad, Ahmedabad, Gujarat, India
Arnab Kumar Laha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pal, S.K., Bhave, A.A., Banerjee, K. (2021). Explainable Artificial Intelligence Model: Analysis of Neural Network Parameters. In: Laha, A.K. (eds) Applied Advanced Analytics. Springer Proceedings in Business and Economics. Springer, Singapore. https://doi.org/10.1007/978-981-33-6656-5_4

Download citation

DOI: https://doi.org/10.1007/978-981-33-6656-5_4
Published: 09 June 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-6655-8
Online ISBN: 978-981-33-6656-5
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics

Explainable Artificial Intelligence Model: Analysis of Neural Network Parameters

Abstract

Similar content being viewed by others

Review of Research in the Field of Developing Methods to Extract Rules From Artificial Neural Networks

An object-oriented neural representation and its implication towards explainable AI

LioNets: a neural-specific local interpretation technique exploiting penultimate layer information

Keywords

1 Introduction

2 Transparent Neural Network Model (TRANN)

3 TraNN Parameter Estimation

4 TRANN Model Parameter Test for Significance

5 Simulation Study

6 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Explainable Artificial Intelligence Model: Analysis of Neural Network Parameters

Abstract

Similar content being viewed by others

Review of Research in the Field of Developing Methods to Extract Rules From Artificial Neural Networks

An object-oriented neural representation and its implication towards explainable AI

LioNets: a neural-specific local interpretation technique exploiting penultimate layer information

Keywords

1 Introduction

2 Transparent Neural Network Model (TRANN)

3 TraNN Parameter Estimation

4 TRANN Model Parameter Test for Significance

5 Simulation Study

6 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation