1 Introduction

When classifying artificial intelligence (AI) into four waves, deep neural network (DNN)-based algorithms, for example, a convolutional neural network (CNN), a recurrent neural network (RNN), and a generative adversarial network (GAN), correspond to the second wave and focus on improving the prediction ability by conducting learning with a large amount of data. Despite the excellent recognition performance of DNN-based algorithms, a DNN is greedy in terms of requiring large amounts of training data. Because the learning of a DNN relies on an error backpropagation algorithm, this DNN-based model cannot explain the structure of a black box model or the results of inference. The current generation of AI systems are opaque, non-intuitive, and difficult for people to understand owing to their difficulty in explaining their decisions and actions to users. Therefore, as the third wave of AI, the ability to observe the cause and effect of reasoning in a machine learning model is required, and as a result, the necessity of explainable AI (xAI) or interpretable machine learning (IML) research has emerged [1]. xAI is essential for the decision-making of users because users should be able to understand AI decisions, trust the results, and manage such information effectively.

xAI technologies can be largely divided into a transparent design and post-hoc explainability. An AI model is considered to be transparent if the model structure is understandable by itself. Transparent AI models contain one or all levels of model transparency (e.g., simulatability, decomposability, and algorithm transparency) [2]. Representative algorithms of transparent AI models include decision trees, k-nearest neighbors, and Bayesian models. These methods have an advantage in that there are few variables and the relationship between the variables is readable; however, there is a disadvantage in that the prediction performance is lower than that of a DNN. In terms of the model prediction performance, the performance of black box models, such as a DNN, exceeds that of simple machine learning algorithms. However, because black box models cannot meet the model explainability, the goal of a post-hoc explanation is to create a separate explainable subsystem while leaving the black box model as is in terms of how the model applies inference predictions for the inputs.

Post-hoc explainability can be further divided into model-agnostic and model-specific methods [1, 2]. Because model-agnostic methods are not tied to a specific type of ML model, they are suitable for more general-purpose applications. Among several model agnostic approaches, we focus on model simplification, in which the model is simplified by eliminating parameters that approximate a complex model as a transparent model. Because a model simplified by imitating a complex model has some properties of model transparency, an explanation by simplification is possible and has the advantage of not losing the prediction performance of the original model. An explanation by simplification refers to the technique of rebuilding an entirely new system based on the trained model to be explained. Models that have simplified the previous complex model typically attempt to reduce the complexity and maintain a similar prediction performance while optimizing the functionality and similarity of the previous model [2]. In addition, it is possible to describe the feature relevance for training and test data through a simplified approach.

Explanation by simplification is a technique that can be applied most widely in the category of post-hoc model-agnostic methods regardless of the complexity of the black box model [2]. In recent years, there have been many studies on model simplification in the field of xAI, indicating that this approach is expected to continue to play a central role in xAI. Similar studies related to explanation by simplification are as follows.

Bastani et al. [4] proposed model extraction for interpreting the overall reasoning process achieved by a model. Given a model f, the interpretation produced by the proposed approximation is T(x) ≈ f(x), where T is an interpretable model. This method takes T as a decision tree, which has been established as highly interpretable. However, an interpretable decision tree incurs an overfitting and a deteriorated performance compared to a complex model. Tan et al. [3] proposed a model distillation algorithm called distill-and-compare. With this method, a transparent student model is trained to mimic the risk score assigned by the black box model as a teacher to gain insight into the black box model. However, this method does not present the difference in prediction performance and feature contribution according to the degree of distillation of the mimic model, which is an important measure of model simplification.

In terms of simplification methods of DNN using knowledge distillation, Zagoruyko and Komodakis [5] defined attention to the CNN, which improves the performance of the student CNN network by mimicking the attention map of a powerful teacher network. Similarly, Xu et al. [6] introduced DarkSight, a dimension reduction technique for interpreting deep classifiers based on a knowledge distillation. DarkSight matches the dark knowledge between students and teachers and compresses the black box teacher classifier into a simple and interpretable student classifier. However, because these methods still rely on compressing the existing DNN model to shallow DNN model, the model’s transparency is limited. Kim et al. [7] proposed a method for analyzing and simplifying the black box model of a deep random forest (RF) using the proposed rule removal. The feature contribution provides the basis for determining the impact of a feature on the decision-making process of the rule set, and the black box model can be simplified by selecting top-k important rules based on measuring the feature contribution. As a result, the simplified model has fewer parameters and rules than the original model. Because this method relies on the traditional tree rule evaluation method, the reliability of rule removal is weak. Kim and Boukouvala [8] investigated the effectiveness of a subset selection method for developing a surrogate regression that balances accuracy and complexity. The subset selection produces a sparse regression model by selecting only a subset of the original features, which are linearly combined to produce a different set of surrogate models. However, this method requires high computational cost for feature selection and identification of model parameters, and performance degrades as the dimension of the problem increases. As an application of model simplification, Kim et al. [9] also proposed a lightweight pupil tracking algorithm for on-device ML that uses a fast and accurate cascade deep regression forest instead of a DNN. A pupil estimation is applied roughly in a layer-by-layer regression forest structure and simplifies each regression forest using the proposed rule distillation algorithm to select top-k significant rules that make up the regression forest. The goal of this algorithm is to create a more transparent and adaptable model for application to on-device ML systems while maintaining an accurate pupil tracking performance. However, this method has the disadvantage that the higher the distillation ratio, the more the model performance is overfit or deteriorated.

The various model simplification methods introduced so far still have the following limitations. 1) Surrogate models are still not transparent because they are composed of several unnecessary rules or layers. 2) The higher the model distillation ratio, the more the surrogate model is trained to overfit and the feature relevance performance decreases. 3) The algorithm for model distillation is heuristic or relies on traditional rule contributions.

In this study, we propose a new xAI method called lightweight surrogate random forest (L-SRF) that simplifies the model by decomposing the black box teacher model and increases transparency at the same time. With a pattern similar to distill-and-compare [3], the L-SRF can replace the existing heavy, deep, but high performing teacher complex model while maintaining the performance of the black box model. Through an L-SRF, it is possible to analyze the feature relevance that affects the inference, and to explain how the L-SRF model structures operate in the inference process. In addition, our approach works with any model family and is independent of the implementation.

In our initial study [10],Footnote 1 we introduced a brief SRF to simplify the black box teacher in terms of the model size and prediction performance for solving the classification. However, in this study, we focus on a detailed explanation of how the black box complex model is distilled and rebuilt into an explainable simplification structure and prove the efficiency of the L-SRF model in terms of the transparency and accuracy.

2 Surrogate random forest

DNN models achieve a higher performance as the model becomes deeper and heavier but has a disadvantage in terms of the explainability. In addition, owing to the large number of parameters, the memory usage increases and the speed decreases. To create a model that is explainable and lightweight, a surrogate model based on the teacher–student (T-S) framework [3, 5, 6, 11] was introduced that can construct a shallow student model by reducing the size of the teacher model while maintaining a performance similar to that of the deep and wide teacher model. The approach for creating a surrogate model can be divided into two types depending on which model the user is targeting. If we focus only on reducing the weight of the model, the complex teacher model can be reconstructed into a transparent approach such as a decision tree. This method has an advantage in that the model itself is interpretable and transparent; however, it has a disadvantage in that the performance of the surrogate model is much lower than that of the teacher model when there are numerous classification classes, and the number of dimensions of the feature vector is large. Another method is to reduce the surrogate model itself to a gray box. With this method, the performance of the model is similar to that of the teacher model, and the feature relevance and importance that contribute to the decision of the model can be inferred. In this case, a random forest (RF) [12], gradient boosting method (GBM) [13], XGBoost [14], and CatBoost [15] methods are used as surrogate models.

In this paper, we propose an L-SRF model that can maintain the performance of a complex black box model while having fewer parameters using the T-S framework. In addition, instead of a typical post-hoc based method that must concurrently maintain a black box model for prediction and a surrogate model for explanation, this study aims at achieving a prediction and explanation simultaneously with a single L-SRF. The GBM, XGBoost, CatBoost and an RF are mainly used as student models to create an explainable surrogate model. Unlike the GBM, XGBoost, CatBoost, an RF preserves the properties of the rules that make up the tree, and thus are more effective in eliminating unnecessary rules while maintaining the tree structure. By contrast, boosting-based models change the structure of a tree by using gradient differences, and thus it is difficult to apply to a rule distillation using the characteristics of the rules. To create a surrogate model that achieves a good performance, it is important to develop a teacher model with an excellent performance. Therefore, this paper uses automated machine learning [14] to create a DNN-based teacher model with the highest performance for a given dataset. By following the T-S framework and our proposed rule distillation algorithm, it is possible to create a reduced L-SRF model that inherits the characteristics of the teacher.

The process of generating a student RF model based on the T-S framework is as shown in Fig. 1. The training dataset is divided into dataset A for training the teacher model and dataset B for training the student model. First, the teacher DNN model is obtained by applying dataset A labeled with 0s and 1s (hard target) to AutoML (http://AutoML.org). Then, by inputting the unlabeled dataset B in the trained teacher DNN, a soft target, which is a class-specific probability value output from softmax, is assigned as a label to dataset B. Now, we train the student RF model using dataset B, which is labeled a soft target. The student RF selects the model with the most similar performance as the teacher while controlling the number and depth of the tree. During this process, to prevent an RF overfitting, various RFs are learned using the M-fold-cross-validation method, and the RF with the best performance is selected as the final student RF. The selected student RF model can be trained to consider the inter-class relationships of the teacher DNN model by using training data labeled as a soft target. The student RF model created through a T-S framework learning is called the SRF model.

Fig. 1
figure 1

Teacher-student training framework: a the training dataset is divided into labeled dataset A and unlabeled dataset B. The teacher DNN model is trained with dataset A. b unlabeled dataset B is applied to the trained teacher DNN and a class-specific probability value is assigned as a label to dataset B. c soft target dataset B is used to train the student RF model. d the RF with the best performance is selected as the final student surrogate RF using M-fold-cross-validation method

3 Lightness of SRF

A Shapley additive explanation [16] is a representative surrogate model-based feature relevance estimation method. With this method, a data prediction is conducted using a black-box model, and the feature relevance is applied using a surrogate model. Therefore, for an explainable prediction of the data, the black box model and the surrogate model must be used at the same time. However, this typical post-hoc based method is difficult to use in a lightweight system because the size of the model becomes excessively large. Therefore, our proposed L-SRF has the following goals: 1) L-SRF does not maintain a separate black box model, but can preserve the data prediction performance, and 2) the model itself has better transparency than the initial surrogate model. 3) It makes the SRF lighter but maintains the explainability of the feature relevance of the initial surrogate model by eliminating only redundant or unnecessary rules. To further lighten the SRF model obtained from the T-S framework, we proposed a rule distillation method based on the Cross-Entropy Shapley (CES) value.

3.1 Cross-entropy Shapley value

The RF is an ensemble model of decision trees, where each decision tree is a set of rules that are paths from a root to an intermediate and finally to a leaf node [12]. We can reduce the rules of the RF by evaluating the contribution of all the rules constituting the decision trees and eliminating the rules with a low contribution. In this study, we use the Shapley value [17] to determine the contribution of the rule. The original Shapley value was used to measure the contribution of the input feature from a machine learning model. This value measures the difference in accuracy according to the presence or absence of a specific feature, and the greater the difference, the higher the degree of contribution given to the corresponding feature. In this study, instead of determining the contribution of the input feature, the Shapley value is used to determine the contribution to the rule in the SRF.

Because the Shapley value is an algorithm that determines the contribution of individual features, it is necessary to modify the algorithm to evaluate the contribution of the rules constituting the SRF. Therefore, we propose a new CES value to evaluate the contribution of each rule of the SRF. Whereas the existing rule elimination method evaluates the prediction accuracy according to the rule of the tree [10], the proposed CES values can be used to evaluate the more detailed rule contribution by considering the probability for each class of a particular rule used in the tree. First, suppose that the rule set R of SRF consists of N rules. Here, rj is the j-th rule constituting R, and \({\tilde {\mathrm {R}}}\) is a subset R composed of N − 1 rules excluding the rj rule. In this case, the contribution of the rj rule in a subset \({\tilde {\mathrm {R}}}\) can be calculated by considering the classification probability pi for each class c. In addition, \({T_{CE}(\tilde {\mathrm {R}},r_{j})}\) represents the cross-entropy between a subgroup g and an individual rule rj.

$$ T_{CE}(\tilde{\mathrm{R}}, r_{j}) = -\sum\limits_{i=1}^{c}p_{i}(\tilde{\mathrm{R}} \cup r_{j})\log(p_{i}(\tilde{\mathrm{R}})) $$
(1)

The CES value Φ(rj) of a j-th rule is the weighted and summed contribution of all possible rule combinations that the j-th rule can contain:

$$ \mathrm{\Phi}(r_{j}) = \underset{\tilde{\mathrm{R}}\subseteq{\{r_{1},...,r_{n}\}}\setminus{r_{j}}}{\sum} {{|\tilde{\mathrm{R}}|!(N-|\tilde{\mathrm{R}}|-1)!}\over{N!}}T_{CE}(\tilde{\mathrm{R}}, r_{j}) $$
(2)

By measuring the CES value for each rule of the initial SRF, the SRF is reduced by eliminating the rule with a low contribution.

3.2 Rule distillation using mini-grouping

In general, the number of rules of an SRF ranges from tens of thousands to millions of rules, depending on the tree and node depth. The individual rule’s contribution check and distillation process for the entire rule not only requires a lengthy processing time, it can also cause an over-fitting of the model. Therefore, in this study, a random mini-grouping method was devised to minimize the overfitting problem caused by rule distillation based on individual contribution checks. In the random mini-grouping, the rules of the SRF are randomly grouped into K mini-groups, as shown in Fig. 2b, and the degree of contribution is evaluated by estimating the CES value for each mini-group. The CES value for each mini-group is measured (Fig. 2c) against rest mini-groups in the same manner as the original Shapley method. This process is repeated H times, and the final contribution of the rule is determined by the average value of each rule in the mini-group, as shown in Fig. 2c. Finally, by eliminating the rules with a low contribution according to the measured contribution, the model can greatly distill the model size while maintaining the existing prediction performance.

Fig. 2
figure 2

Rule distillation process using random mini-grouping and CES: a a rule set R consisting of rules extracted from the SRF and b mini-groups randomly generated from R and c the CES value is measured for each mini-group. This process is repeated H times. d Average calculation of CES values for each rule belonging to the mini-group. Rules with small average CES values are eliminated

Algorithm 1 introduces the rule distillation process through a random mini-grouping using the CES value of the SRF model. Equation 3 of Algorithm 1

figure a

was modified to calculate the CES value between the mini-group set and each mini-group g instead of calculating the CES value between a subset \({\tilde {\mathrm {R}}}\) and individual rule r.

4 Materials and methods

4.1 Datasets

The UCI repository [18] and the Penn Machine Learning Benchmarks (PMLB) [19] provide several datasets for testing the machine learning and intelligent systems. In this paper, we prove the effectiveness of the proposed method using Adult Income among UCI datasets, and the Phoneme, Car, Mushroom, Chess, and the Mfeat factors among PMLB. The Adult Income dataset predicts whether an individual’s income will exceed $50,000 per year, based on demographics of adults aged 16 and older. This dataset contains 48,842 demographic data on people who participated in the 1994 census for 14 attributes such as age, gender, occupation, workclass, and education. The Phoneme is a dataset to distinguish between nasal and oral sounds. It contains 5,404 data for six attributes. The Car is a dataset for evaluating cars according to the conceptual structure, such as the estimated safety of the car, trunk size, number of people carrying, number of doors, etc. It consists of 1,728 samples for seven attributes. The Mushroom is a dataset containing physical properties for classification as poisonous or edible and contains 8,124 samples with 20 properties. The Chess is a dataset for estimating the result of a chess match when only king and pawn remain in the white side and king and rook remain in the black side. It consists of 3,196 samples with 20 attributes. The Mfeat factor is a dataset for recognition of handwritten numerals (0-9). 200 instances per class (for a total of 2,000 samples) have been digitized in binary images with 216 attributes. We conducted experiments using each dataset during the training and testing processing for the models. The UCI adult income dataset was divided into a ratio of approximately 7:3 following the official training/testing split, and the other datasets were divided into five-folds.

4.2 Toolkit and library

In this study, the AutoGluon [20] toolkit is used to generate the AutoML-based teacher model, and Scikit-learn and Python are used to implement the surrogate RF model. In addition, we use the SHAP package in Python to visualize the influence of the feature vectors on the output.

5 Experimental results

Selecting an accurate teacher model is one of the essential factors in the T–S framework because the performance of the student model largely depends on the performance of the teacher model. Various machine learning algorithms can be used as the teacher model, but in this study, AutoGluon, an AutoML toolkit [20] for deep learning, is used. AutoML is highly adaptable to various real-world applications such as images, text, or tabular data, and can automatically utilize the latest deep learning technologies without expert knowledge. In addition, AutoML makes it easy to utilize automatic hyperparameter tuning, model selection/architecture discovery, and data processing. For the student model SRF, Scikit-learn and Python were applied.

5.1 Hyper-parameter evaluation for model simplification

The mini-grouping process requires two hyper-parameters, the number of mini-groups, and the number of grouping iterations. Because the size and performance of the L-SRF depend on two parameters, it is necessary to find the optimal parameters to create a lightweight and generalized L-SRF. First, to find the optimal iteration, we measured the F1-score by changing the number of iterations and the distillation rate for the initial SRF using the UCI Adult Income dataset, as shown in Table 1. At this time, the maximum number of allowed mini-groups was fixed at 50. If the maximum number of allowed mini-groups is too large, an overfitting may occur because the number of rules allocated to one mini-group is too small. Here, the initial SRF model created based on the T-S framework has 100% (1.0) of the rules before rule distillation is applied. From the initial SRF, we repeatedly removed the number of rules by 10% (0.1) and evaluated the relative F1-score.

Table 1 Comparison of F1-score performance according to number of iterations for mini-grouping with rule distillation rate for the SRF using UCI Adult Income dataset

As shown in Table 1, the difference in F1-score according based on the number of iterations is low. This means that the number of iterations does not significantly affect the improvement in the performance of the L-SRF. When the number of iterations is 3, the average F1-score of the all rule distillation rate shows slightly higher than other cases. Increasing the number of iterations for L-SRF increases the effort required during the learning process, and thus when the performance is similar, an effective approach is to selectively limit the number of iterations as much as possible and find the optimal number of mini-groups, which is another parameter. Therefore, according to the experimental results listed in Table 1, we determined the optimal number of iterations to be 3. According to the results, the number of iterations was therefore set to 3, and the number of minigroups was repeatedly changed. Table 2 shows the resulting measurement F1-score while adjusting the rule distillation rate and the number of mini-groups. As shown in Table 2, it can be seen that the highest F1-score is obtained when the number of mini-groups is 50 and the rule distillation rate is 0.7. In addition, in terms of F1-score, SRF (0.7) shows an approximately 0.04% higher F1-score than the initial SRF (1.0) without a rule distillation, and thus it can be seen that unnecessary rules or rules with a low contribution are effectively eliminated through the proposed algorithm.

Table 2 Comparison of F1-score performance according to number of mini-groups with rule distillation rate for the SRF using UCI Adult Income dataset

The purpose of the proposed L-SRF is to design a lightweight surrogate model that can operate in a device with low specifications while maintaining the performance of the teaching model. Therefore, we evaluated how the number of parameters and operations decreased according to the rule distillation rate in the SRF. Similarly, a change in performance according to the rule distillation rate was also observed. During the experiment, the number of iterations was set to 3, and the number of mini-groups was set to 50 according to the results of the previous experiments.

As shown in the experimental results in Table 3, the number of SRF parameters decreased in proportion to the rule distillation rate. In particular, when we eliminated 70% of the rules from the complete set of rules (SRF(0.3)), the number of parameters was reduced by approximately 53% while maintaining the same level of F1-score. Through these experiments, it can be seen that the proposed model simplification method can effectively distillate the size of the model while maintaining the existing F1-score. In particular, in the case of SRF (0.1), the number of rules is reduced to about 300, so the variables included in the rules can be read, and the size of the rule set can be managed by humans without external assistance.

Table 3 The change in model size according to the rule distillation rate with the UCI Adult Income dataset

5.2 Surrogate model comparison

Representative surrogate models used in machine learning include the GBM [13], XGBoost [14], CatBoost [15] and an RF [12]. With the GBM, the gradient informs the weakness of the classifier learned thus far, and the model learns to compensate for the weakness. The GBM has an excellent boosting ability, but the learning is slow and has an overfitting problem. XGBoost was proposed to overcome the shortcomings of the GBM. This method is faster than the GBM and provides regulation and an early stopping function to prevent an overfitting. CatBoost provides a novel gradient boosting scheme for reducing overfitting, as well as this method allows to fast parameter tuning through categorical feature supporting. To test this possibility as a surrogate, we first trained four surrogate models with the output of the same teacher DNN using the same Adult Income dataset. All four methods consisted of 10 trees, and the maximum depth was fixed at 7, and the number of features was set to \(\sqrt {d}\) for finding the best splits in each tree nodes.

As shown in Table 4, the GBM showed a level of F1-score 0.42% lower than that of the teacher model, and XGBoost was 0.1%, and CatBoost was 0.19% which showed no significant difference from the teacher. The SRF showed a 0.39% lower performance than the teacher model, which achieved a performance 0.09% lower than that of XGBoost. In terms of the number of parameters, the GBM requires approximately 1.84-times more parameters and XGBoost requires approximately 1.71-times more parameters, and CatBoot requires approximately 1.91-times more parameters than the SRF. From these results, we confirmed that SRF is a suitable model for model simplification because it inherits the performance of the teacher model more closely than other boosting-based methods and uses a small number of parameters.

Table 4 Comparison of precision, recall, F1-score, and number of parameters for four surrogate models trained with a teach model and the UCI Adult Income dataset

5.3 Comparison with machine learning

To prove the excellent performance of the proposed L-SRF, a comparative experiment was conducted with the RF [12], ExtRa [21], k-NN [22], SVMs [23], gcForest [24], AdaBoost [25], GBM [13], XGBoost [14], LGBM [26], CatBoost [15], NgBoost [27] which is a multi-parameter boosting algorithm, and KiGB [28] which is an unified framework for knowledge intensive gradient boosting, and teacher DNN based on Auto Gluon using additional datasets such as the Phoneme, Car, Mushroom, Chess, and the Mfeat factors of PMLB. The evaluation procedure was conducted under Five-fold cross-validation manner. All 13 methods used the same trees, the maximum depth, and the number of features as in previous experiments..

As shown in Table 5, teacher DNN performed the best for all of the dataset. Among the comparison methods except for the teacher DNN, XGBoost showed the best performance for the Car and Chess datasets. Five methods that do not use boosting ([12, 21, 24]) showed an overall lower accuracy than the other boosting-based methods ([25], [13, 15, 26, 28]). Three boosting-based methods (CatBoost [15], NgBoost [27], and KiGB [28]) showed similar results for five datasets. However, these methods still showed a 1-4% difference in accuracy compared to XGBoost [14]. Compared to L-SRF (0.7), the accuracy of XGBoost was improved by about 3% overall, but the number of parameters is actually 2 times more required. In particular, the L-SRF (0.7) model obtained through rule distillation showed a similar performance with the original SRF (1.0) model except the Phoneme generated by teacher DNN. These results show that the proposed CES value and random mini-grouping method of L-SRF effectively eliminate unimportant rules that degrade the performance, thereby increasing the performance. However, although L-SRF (0.4) used 1.34 times fewer parameters than L-SRF (0.7), the accuracy was similar or higher than that of gcForest [24] and LGBM [26] for the Car, Mushroom and Chess datasets.

Table 5 Performance comparison with machine learning models using PMLB datasets

Experimental results showed that the L-SRF model based on the T-S framework can maintain similar performance to the method using only the model itself, although it used a small number of parameters.

5.4 Visualization

Among the post-hoc xAI methods, unlike the DNN-based method, the biggest advantage of the RF-based surrogate model is that it can measure the feature relevance. The contribution of the feature to the output of L-SRF was analyzed through Shapley additive descriptions (SHAP) [16], which can measure the feature relevance, and is an xAI technique. In other words, we use SHAP to quantify how important the features of the L-SRF model are to the results, and based on this, we verify that the proposed method can achieve a model simplification while maintaining the feature relevance. In addition, by comparing the SHAP results of the original SRF (1.0) and L-SRF (0.2), it can be seen that even if the model is light, it does not overfit and preserves important rules well, thereby maintaining the feature relevance similar to that of the original model. To more easily compare the SHAP values of the original SRF (1.0) with those of the simplified L-SRF (0.2), the local importance for individual features was measured and visualized in a plot using the UCI Adult Income dataset, as shown in Fig. 3.

Fig. 3
figure 3

Visualization for the magnitude of influence on the output of input features according to model simplification based on SHAP value using UCI Adult Income dataset. The x-axis represents the SHAP mean value (global importance) and SHAP value (local importance), and the y-axis represents 14 input features. a the initial SRF (1.0) model without rule distillation, and b the rule-distillated L-SRF (0.4) model. The importance of each feature for two models shows almost similar results

As shown in Fig. 3, the two methods show mostly similar patterns in terms of local importance. The features affecting the output result and having similar patterns between the two models are in order of “EducationNum”, “Race”, and “CapitalGain”, and the positive and negative effects according to the feature values also show similar patterns. Although many rules were eliminated from the initial SRF, the L-SRF (0.2) showed a similar pattern in terms of the feature correlation, and thus we can see that unnecessary rules were effectively eliminated through the proposed CES and mini-grouping

Second, we visualized the correlation between features using SHAP dependence plots to check whether the correlations are preserved even in a simplified L-SRF (0.2) model. Based on a comparison of the correlations between all features, “EducationNum” was found to have the highest correlation with the other features. Figure 4 shows the SHAP values for the combination of the two features, “Age”-“EducationNum” and “WorkClass”-“EducationNum”, which had a high correlation with “EducationNum”. Through this result, we can infer that the proposed L-SRF model achieves a consistent feature relevance even after the simplification process because it maintains correlations between features even in lightweight models. Therefore, the proposed L-SRF method can improve the model transparency by applying a model simplification while maintaining the feature relevance, which is the basic property of xAI explainability.

Fig. 4
figure 4

Visualization of correlation between features estimated from L-SRF (0.4) using UCI Adult Income dataset, a SHAP dependence plot between “age” and “EducationNum” feature, b SHAP dependence plot between “WorkClass” and “EducationNum” feature

6 Conclusion

In this paper, among several xAI approaches, we proposed a new L-SRF algorithm that can increase the transparency of a complex black box model through a model simplification and analyze the features that influence the prediction through the feature relevance. The proposed L-SRF method has confirmed the ability to compress the model on a small scale while guaranteeing the same prediction performance as the existing complex model. In particular, by applying mini-grouping and a CES proposed in an RF to create a surrogate model instead of XGBoost, GBM or the CatBoost, we were able to design a lightweight surrogate model that can effectively reduce the number of rules and maintain the prediction performance and feature relevance at the same time.

The proposed L-SRF is similar to XGBoost, GBM, and CatBoost in terms of accuracy through experiments on various data sets. In terms of model size, the proposed method effectively eliminated the less important rules, thereby significantly reducing the model size and avoiding the overfitting problem caused by a model reduction. In future research, we will improve the L-SRF model for application to a variety of data, including images and videos, and apply it to an embedded device to test its feasibility in real problems. Furthermore, because the L-SRF is still less accurate than XGBoost even if unimportant rules are removed, it is necessary to devise a lightweight version of XGBoost by modifying the proposed rule distillation method to fit the XGBoost.