Introduction

The machining process to obtain the ideal surface quality is usually a basic technical requirement of machining, the surface profile and roughness of the component are two important indicators of the surface quality. Surface roughness, as an important indicator widely used to measure the surface properties of the workpiece, has an important impact on the performance of mechanical parts as well as production cost. In polishing processing, reducing the surface roughness of the workpiece and improving its surface quality are the main goals of all kinds of polishing processes. To reduce the surface roughness of a mechanical part without changing the geometry, various polishing processes are often used, and these operations can be time-consuming and expensive. Therefore, in the field of polishing processing, more and more researchers focus on surface roughness. Optimizing and predicting surface roughness can effectively increase productivity and reduce costs, which have important research significance and application value for the machining processes.

Numerous studies have shown that the surface roughness produced during machining is influenced by several machining parameters. Different machining methods, such as turning, milling, and polishing, have different influences on surface roughness. The process of surface roughness reduction is influenced by the properties of the workpiece and various uncontrollable factors, making it very difficult to obtain an accurate prediction model directly (Lu, 2008). Therefore, many researchers have worked on the optimization of various machining parameters and their effects on the prediction of surface roughness. Through the research, it is found that the research methods on surface roughness of polishing processing are divided into two main categories: classical methods based on experimental and mathematical analysis and methods based on artificial intelligence (AI). The classical methods include experimental analysis (Huang et al., 2022), statistical analysis (Solheid et al., 2020) and response surface methodology (RSM) (Nguyen et al., 2020; Jian et al., 2022). In AI-based methods, a portion of the research applied neural network algorithms for surface roughness prediction. For example, Schneckenburger et al. (2020) developed an artificial neural network (ANN) model to predict the surface roughness and shape accuracy of the glass after polishing, which was helpful to reduce the polishing iterations, thus reducing the production time. Besides, swarm intelligence algorithms such as PSO (Wang et al., 2020) and GA (Khalick Mohammad et al, 2017) are common techniques applied in this field. The tendency during the latest years is the research of machine learning techniques. As a result, the combination of machine learning algorithms and swarm intelligence algorithms has become a common method for optimizing machining parameters, and this type of method can effectively reduce surface roughness(Fan et al., 2022a; Khalick Mohammad et al., 2017).

The above analysis shows that, for classical methods, simple mathematical models often could not achieve the desired prediction accuracy. Although AI-based methods can improve the prediction accuracy of the model to a certain extent, complex algorithms similar to neural networks often require a large amount of data support for model training to ensure the reliability of the prediction results. However, in the actual machining process, only a small amount of experimental data can be collected for each experiment due to various conditions such as experimental environment, technology, and cost budget. Too few data samples can easily lead to under-fitted neural network models, thus affecting the accuracy and reliability of prediction results. In addition, the black-box prediction model based on neural networks can not explicitly explain the hidden correlation between the control parameters and the predicted values, thus failing to meet the needs of practical production.

Ensemble learning is a framework that accomplishes learning tasks by building and combining multiple base learners. The framework often achieves better generalization performance than a single learner by combining multiple weak learners with simple structures. Ensemble learning, a widely used AI technique, provides new ideas for our research. On one hand, weak learners with simple structures require much fewer data than complex neural network algorithms. On the other hand, the ensembled strong learner can obtain higher prediction performance. In addition, given the successful application of swarm intelligence algorithms for machining parameter optimization, a small number of studies have been conducted to apply GA to the prediction of surface roughness. For example, Wang et al. (2022a, b) proposed a robust surface roughness prediction model (ELGA) based on ensemble learning of GA, which can be used for the prediction of the surface roughness of multi-jet polishing (MJP) of 3D printed 316 L stainless steel. Inspired by this, we propose to consider a combination of a swarm intelligence algorithm and an ensemble learning algorithm for surface roughness prediction in abrasive water jet polishing, since it has a similar material removal mechanism to MJP (Wang et al., 2017a, b). Given the successful applications of the differential evolution (DE) algorithm in industry in recent years (Yuan et al., 2021; Ibrahim & Tawhid, 2022), we combine the DE algorithm, an emerging technique for swarm intelligence, with an ensemble learning approach. Like other swarm intelligence algorithms, DE is a stochastic optimization algorithm that simulates biological evolution. Compared with GA, DE retains the population-based global search strategy and reduces the complexity of genetic operations by using real number encoding, simple difference-based variation operations, and one-to-one competitive survival strategies.

A novel ensemble framework for polished surface roughness prediction is designed in this paper. The method selects six classical regression algorithms, Random Forest (RF), Extreme Gradient Boosting (XGBoost), Least Absolute Shrinkage and Selection Operator (LASSO), Support Vector Regression (SVR), Gradient Boosting Regression (GBR), and Extra-Tree Regressor (ETR), as the base models and uses DE to search for optimal weight assignment schemes for different models. Notably, we design a simplified encoding mechanism for the individual design of DE for the characteristics of polishing processing, which can further improve the computational efficiency of the algorithm. The experimental results show that the ensemble framework can further improve the prediction accuracy of the algorithm for the low-dimensional small-sample experimental data used in this study. Due to the small amount of experimental data, although the underlying models are prone to underfitting, ensemble learning can maximize the advantages of each model to complement each other, so that the whole algorithm can achieve high prediction accuracy even with small sample data. In addition, we use genetic programming (GP) based Evolutionary Forest (EF) algorithm for automatic feature construction in the data pre-processing stage to further improve the prediction performance and interpretability of the model.

The rest of this paper is organized as follows. Section Research progress of surface roughness prediction reviews the related research on surface roughness in polishing processes. Section ELDEA for surface roughness prediction describes the ELDEA method based on ensemble learning and DE for surface roughness prediction. Section Validation experiments gives a detailed experimental setup. Section Results and discussion compares and analyzes the performance of different prediction methods and provides interpretability analyses of the data features. Section Conclusions and future work summarizes our work and proposes future research directions.

Research progress of surface roughness prediction

By investigating the research related to surface roughness, we have grouped the related research to surface roughness into two main categories: reducing surface roughness by optimizing machining parameters and direct prediction of surface roughness. These two approaches optimize surface roughness indirectly and directly, respectively, and have in common that both of them utilize the relationship between machining parameters and surface roughness. Currently, a large number of studies have focused on the optimization and prediction of surface roughness for turning and milling, and less research has been done on polished surface roughness. As a result, this section reviews and analyzes the current status of polished surface roughness research.

Reducing surface roughness by optimizing machining parameters

Reviewing the previous studies, the methods for minimizing surface roughness by optimizing machining parameters can be divided into three main categories, namely, optimization methods based on experimental and mathematical analysis, optimization methods based on machine learning, and optimization methods based on swarm intelligence.

Before machine learning was widely used in manufacturing, the traditional method of combining experimental models and RSM applied to the study of surface roughness was the dominant research direction at that time. Huai et al. (2017) developed a roughness ratio prediction model based on orthogonal experiments for the polishing processing bladeslade of aviation engines. Chen et al. (2019) performed a regression analysis based on the results of Central-Composite designed polishing experiments and established a prediction model for surface roughness and residual stress. In addition, a comprehensive analytical approach was proposed to optimize the process parameter intervals to obtain the globally optimal parameter intervals for improving the surface quality during the polishing of Ti-6Al-4 V blades.

With the development of swarm intelligence algorithms, researchers have started to apply them to the optimization of machining parameters. To achieve the optimum balance between high material removal rate (MRR) and low surface roughness, Chen et al. (2018) established a prediction model for MRR based on linear regression analysis of experimental data for the belt flap wheel polishing of TC4 aero-engine blisk blades and then solved a single-objective constrained optimization problem by using PSO. Fan et al. (2022b) proposed a novel finishing parameter optimization method considering dimensional accuracy for magnetic compound fluid finishing (MCFF), which makes use of a multi-objective PSO algorithm to optimize the finishing parameters.

In recent years, machine learning technology has emerged rapidly and has been successfully applied in the field of manufacturing. Wei & Wu (2022) proposed a graphical model and a conditional variational autoencoder to extract the features of surface topography in the CMP process. Moreover, process variables and the extracted features of surface topography are fed into an ensemble learning-based predictive model to predict the MRR. To achieve high quality polishing of an M300 mold steel curved surface, Tong et al. (2019) first obtained the influence degree of various factors on polished surface roughness and the combination of parameters to be optimized by the signal-to-noise ratio method and then combined BP and PSO (PSO-BP) to optimize the polishing parameters. The results showed that the method can effectively reduce surface polishing roughness. Wang et al. (2012) developed a series-parallel hybrid polishing machine tool based on the elastic polishing theory, applied it to finish mould surface with using bound abrasives and optimized its by using a combination of GA and ANN to obtain the optimal process parameters.

It can be found that most of the surface roughness optimization methods based on machine learning methods choose to combine swarm intelligence methods to further reduce the surface roughness of workpieces and improve their surface quality. Zhong et al.(2021) reviewed studies related to surface roughness of wood and advanced engineering materials after machining and its prediction and found that ANN trained using PSO and GA have better prediction performance than traditional ANN. The results of this study further demonstrated the advantages of combining swarm intelligence algorithms with machine learning methods applied to the prediction of surface roughness.

Surface roughness prediction method

In addition to the reduction of surface roughness by optimizing machining parameters, a part of the studies tried to model surface roughness prediction directly. Benardos et al. (2003) classified the modeling techniques for surface roughness prediction into three categories, namely, experimental models, analytical models, and AI-based models. Among them, surface roughness prediction models based on AI methods are the main research direction in this field, and the AI-based methods are divided into three branches, including swarm intelligence algorithms, traditional machine learning algorithms, and artificial neural networks. On this basis, we analyzed and summarized the related research on polished surface roughness prediction, and summarized the polished surface roughness modeling techniques into two categories, which are classical modeling methods based on experimental and mathematical analysis and modeling methods based on AI techniques.

Aimed to reduce surface roughness and improve surface quality of mold steel, Xie et al. (2022) optimized the robotic polishing process parameters and obtained the optimal range of each parameter according to a single factor experiment. A surface roughness prediction model was established by the central composite design experiments on the three polishing parameters. Xu et al. (2021) proposed a numerical model for the prediction of surface roughness after laser polishing, which coupled heat transfer, fluid flow, and material vaporization. The surface roughness prediction after laser polishing was tried through the numerical model.

The design of surface roughness prediction methods based on classical methods requires a great deal of expertise on the part of the researcher. Besides, these approaches often require researchers to perform complex mathematical modeling of the relationship between machining parameters and surface roughness. Due to the data-driven black-box mechanism of AI techniques, researchers can perform surface roughness prediction and modeling tasks without much expertise. As a result, more and more studies have started to use AI techniques to solve surface roughness prediction problems and found that AI-based surface roughness prediction models have higher prediction accuracy compared to classical methods. Electrochemical mechanical polishing (ECMP) results are influenced by many factors, predicting its surface quality and the determination of its processing parameters difficult to achieve. To address these issues, Xu et al. (2012) developed a least squares support vector machine with radial basis function based on ECMP prediction model and investigated the effect of polishing parameters on surface roughness through orthogonal experiments.

In addition to using classical machine learning algorithms, many researchers have explored the application of neural networks in surface roughness prediction. Schneckenburger et al. (2020) developed an ANN model for predicting the glass-surface roughness and shape accuracy. The experimental results showed that the prediction of ANN reduces the polishing iterations, thus reducing the production time. Deng et al. (2021) investigated the effects of chemical mechanical polishing (CMP) process parameters on the (MRR) and surface roughness of single-crystal SiC after polishing and their relationships based on the modified Preston equation and established a prediction model for CMP process parameters, MRR and post-polishing surface roughness based on BP neural networks. Qi et al. (2018) developed a prediction model for surface roughness in belt polishing based on a BP neural network with the maximum depth of cut of abrasive grains, the rotation speed of belt and the feed rate as input parameters. The model requires only fewer experimental samples to complete the prediction with higher accuracy, thus saving the experiment cost and time.

Current challenges and reflections

In summary, for polishing processes, the current surface roughness prediction methods based on classical methods and artificial neural networks have their advantages, but also have their limitations at the same time, as described in the introduction section.

As a widely used artificial intelligence technique, ensemble learning is widely used in the study of various regression prediction problems, and this method has better performance than traditional mathematical methods. Due to the simple structure of the base model and the excellent performance of the ensemble framework, ensemble learning can achieve a generalization performance no less than that of neural network models without the support of large amounts of experimental data, and the commonly used interpretable methods can provide interpretable analysis for the ensemble learning model. In addition, given that the swarm intelligence algorithm has been widely and successfully applied to machining parameter optimization problems, we intend to consider using a combination of the swarm intelligence algorithm and ensemble learning for surface roughness prediction modeling to achieve a more accurate prediction of surface roughness.

It is found that a large number of surface roughness studies based on swarm intelligence algorithms use group swarm intelligence algorithms such as GA or PSO, and few people have applied DE to the study of surface roughness. However, compared with other algorithms, DE retains the global search capability while its simple variational operation based on the difference principle makes the algorithm itself less complex and more efficient. Therefore, this paper proposes an ensemble learning model based on DE, which is used for the prediction of surface roughness of polishing processes. Compared with other methods, the proposed method in this paper has some interpretability while ensuring prediction accuracy. First, the ensemble model based on DE weighs the fitting effect of each basic regression model to realize the complementary advantages of the basic models, which makes the ensemble model avoid underfitting when training on small samples of low-dimensional processing data, and thus improves the prediction accuracy. Secondly, interpretability analysis methods can provide us with a more comprehensive understanding of the data and model, and provide more meaningful information to help us improve the data and model, creating a positive feedback loop. Besides, we designed a DE-based simplified encoding mechanism specifically for the surface roughness prediction problem of small sample polishing processing and applied it to the individual design of the DE algorithm, which solved the problems of low efficiency and poor accuracy of traditional coding mechanisms. In addition, we used a GP-based EF algorithm for feature reconstruction of experimental data to further improve the prediction accuracy and interpretability of the model. Finally, we performed interpretability analyses of the model and data at the level of features, and innovatively identified the main processing parameters affecting surface roughness based on the results of interpretability analysis, which provided a theoretical reference basis for subsequent experiments and research.

ELDEA for surface roughness prediction

In this paper, an ensemble surface roughness prediction framework based on DE is presented. The method is applicable in several practical machining cases under polishing, and the effects of different machining parameters on surface roughness for different workpiece materials are considered. Figure 1 shows the framework of our ELDEA method. The ELDEA consists of five parts: (1) data normalization, (2) feature construction, (3) multi-algorithm regression, (4) DE-based ensemble learning, and (5) interpretability analysis.

Fig. 1
figure 1

Framework of ELDEA

Data normalization

In machine learning, it is always necessary to adjust data of different scales to the same common scale, or to transform data of different distributions to a specific distribution, which is collectively called “nondimensionalization”. In gradient and matrix-based algorithms, such as logistic regression, support vector machines, and neural networks, nondimensionalization can speed up the training; in distance-based algorithms, such as K-nearest neighbor and K-Means clustering, nondimensionalization can help improve model accuracy by avoiding the impact of a particularly large absolute value of a feature on the distance calculation. There are various ways to implement dimensionless scaling, such as standard scaler and Min-Max scaler. Since Min-Max normalization is very sensitive to outliers, most machine learning algorithms choose standard normalization for feature scaling. Therefore, we employ standard normalization to pre-process the experimental data as shown in Eq. (1), where µ is the mean of all sample data and σ is their standard deviation. After normalization, the data are in the standard normal distribution in which the mean is 0 and the standard deviation is 1.

$${x}_{normalization}=\frac{x-\mu }{{\upsigma }}$$
(1)

Feature construction

Extracting robust features from raw data is one of the most critical steps to train a highly accurate prediction model. Researchers often spend a lot of time and effort on feature engineering to obtain more important data information from raw data and train machine learning models for a highly accurate prediction. Feature construction is a process that extracts and embed characteristics, properties, attributes, and underlying patterns of data as features (Li et al., 2015), which is an important step in feature engineering and directly determine the quality of the analysis results in data mining.

Constructing high-quality features are extremely challenging due to a lack of relevant domain knowledge. To overcome this difficulty, some automatic feature construction methods based on evolutionary algorithms have been proposed (Virgolin et al., 2018; Tran et al., 2019). Inspired by these methods, we propose to employ an EF algorithm (Zhang et al, 2021) to construct features automatically for training the base regression models. By searching the most effective feature space to represent the data, our approach not only optimizes the feature quality but also improves the generalization ability of the regression model. The workflow of the EF algorithm is shown in Fig. 2. First, the algorithm initializes to form a population consisting of multi-tree GP individuals, i.e., each individual in the population is a set of GP trees. In each individual Φi, the EF algorithm then constructs a decision tree by using the original features of the artifact (e.g., φ1, φ2, φ3, …) to obtain an initial set of features for each individual. Lastly, new features are obtained though the evolution. These new features can not only increase the feature diversity and avoid model underfitting but also improve the prediction accuracy. During the evolutionary process, the EF algorithm uses crossover, mutation, and selection operators to update individuals in the population, as shown in Fig. 2. Taking the crossover operation as an example, for individual Φ1 and individual Φ2, we select a certain tree in each of the two individuals and swap the values of two nodes X2 and X3 in the tree, thus two new GP trees can be generated, corresponding to two new individual Φ1 and individual Φ2.

To make the algorithm has better generalization ability on the regression tasks, the EF algorithm uses 5-fold cross-validation to estimate the generalization loss when evaluating each individual’s fitness value. First, the raw dataset generated by the AWJP experiments will be standardized and randomly divided into two subsets: the training set and the test set in the proportion of 3:1. And then, the training set is divided into 5 equal-sized folds. Among them, 4 folds are used for automatic feature construction to fit each decision tree model in the EF, and the other is used as a validation set to verify the validity of the automatic feature construction. The training set and the validation set are used for the 5-fold cross-validation during the evolutionary process to determine the features. In our implementation, we adopt the absolute deviation as the loss value, as shown in Eq. (2), where f represents a decision tree, Xj represents the j-th sample, and Yj represents the j-th target values. A sample input to a decision tree model will return a prediction value f (Xj). The difference between the predicted value and the true target value is the prediction error Lj of the j-th sample. The prediction error of all validation data points is recorded each time the decision model is built. This error is used as the fitness function of the algorithm to guide the iterations of the algorithm.

$${L}_{j}=\left|f\left({X}_{j}\right)-{Y}_{j}\right|$$
(2)

The well-performing GP individuals are stored in an archive, and each GP individual in that archive is used to construct a training set. Then, we trained a decision tree for each constructed training set. At the end of the evolutionary process, all these decision trees form a forest, i.e., a target forest. At this point, we can construct valid features from this forest for subsequent ensemble learning.

Fig. 2
figure 2

An illustration of the workflow of EF algorithm

Multi-algorithm regression

In the multi-algorithm regression module, we leverage the base regression models to build an ensemble surface roughness prediction framework. We first trained ten base regression models and tuned the hyperparameters for each of them to reach the optimal performance. In the experiment, we found that some of the regression algorithms were underfitting in the experimental dataset. Therefore, we selected six of them who are well trained and converged to build the multi-algorithm regression module. The six selected models are ETR (Geurts et al., 2006), RF (Breiman, 2001), XGBoost (Chen & Guestrin, 2016), SVR (Awad & Khanna, 2015), LASSO (Tibshirani, 1996) and GBR (Friedman, 2001). Table 1 lists the details of these models and their regression errors. It should be noted that the experimental data are derived from AWJP experiments based on the 3D-printed components CoCr. Detailed descriptive information about the data is given in Sect. 4.1. The training data are fitted by each algorithm and the values of surface roughness are predicted by each of them according to the test data. Specifically, we normalized the raw data and then used the hold-out method to initially validate the performance of each base model. In the process of data segmentation, the raw data will be divided into four folds, three of which will be used to train the base model, and the remaining part of the data will be used to test the trained model to obtain its prediction error. The above process is repeated five times, and the average of the five results are taken as the final prediction error of the current base regression model. Finally, the prediction results of the test dataset on each regression model and the corresponding errors are recorded as the baseline performance of the ensemble framework.

Table 1 Different machine learning models

DE-ensemble module

Due to the very small size of the data and low dimensional features, each base regression model easily underfits in the training set, thus leading to very low prediction accuracy. In contrast, the ensemble framework can integrate the output results of multiple base regression models, so that the advantages of each model complement each other. Therefore, we employ an ensemble framework for surface roughness prediction and propose using the DE algorithm to pursue the best ensemble weights for each base regression model so as to maximize the integration effect and results in the best prediction accuracy.

DE algorithm, first proposed by Storn and Price in 1995 (Ahmad et al., 2021), is a population-based adaptive global optimization algorithm for solving real number optimization problems. As a common evolutionary algorithm, DE is widely used in various fields such as data mining, pattern recognition, digital filter design, artificial neural networks, and electromagnetism by its simple structure, easy implementation, fast convergence, and robustness. The evolutionary process of DE is very similar to that of GA, both include mutation, difference calculation, and selection operations. However, compared with GA, DE retains the population-based global search strategy, and the simple mutation operation based on difference and one-to-one competitive survival strategy reduces the complexity of the genetic operation.

Since DE has a simple structure and is easy to implement, we use the DE algorithm as a search engine in this module and design a simplified encoding mechanism for the individual design of the DE algorithm according to the problem to be optimized. This module is mainly responsible for integrating and processing the output of multiple algorithm modules, searching for the best weight assignment for each base regression model through the DE-based simplified encoding mechanism, and integrating each regression module according to the optimal solution obtained from the search to obtain the final surface roughness prediction value.

Individual design

DE is a population-based heuristic search algorithm which is very efficient in search the global optimal. In DE, each individual in the population corresponds to a solution vector, and each individual is regarded as a target vector, as shown in Eq. (3), where D is the dimension of the target vector, and NP denotes the number of individuals in the population.

$$\stackrel{-}{{x}_{i}}\,=\left({x}_{i,1},{x}_{i,2},\dots ,{x}_{i,D}\right) \quad i=1, 2, \dots ,NP$$
(3)

The individual design scheme of the algorithm is shown in Fig. 3a. When applying DE to search the weights of each regression model, each individual usually represents a weight assignment scheme that integrates each regression model. Where αi denotes the specific weight assignment value of the i-th regression model in that individual, i = 1,2,…,m, and m denotes the number of base models, which is a traditional coding method as shown in Fig. 3b. In the traditional encoding method, a population contains n individuals with m dimensions in each individual, indicating that the algorithm can provide a total of n weight assignment schemes for m models.

However, during the evolution, a large number of individual solutions are generated, and since this optimization problem is a real number encoding problem, it may lead to an excessive dimensionality of the search space and thus consume a large amount of computation time. Therefore, we design a simplified encoding mechanism to tackle this problem. This simplified coding mechanism reduces the dimensions of the search space from D to 1, which speeds up the search. In addition, it allows us to further improve the efficiency of the algorithm by eliminating the need to spend a lot of time and effort on tuning the hyperparameters (i.e., the size of the population). Figure 3b and c depict the differences between the traditional encoding mechanism and our simplified encoding mechanism of DE proposed in this paper. In our proposed simplified encoding mechanism, the dimension of each individual is only one and it is a real number in the range [0,1], indicating the weight assignment of the regression model corresponding to that individual. The whole population corresponds to the set of weight assignments of a complete ensemble framework, and the sum of all individual values in the population is 1, indicating that the weights of each model sum to 1. Based on this design scheme, an individual is a real number αi of length 1 (α∈[0,1]) and αi denotes the weight of model i in the integration framework as α. A population is a sequence of real numbers [α1, α2, α3, …, αm−1, αm] of length 6 and ∑αi = 1.

Fig. 3
figure 3

Difference between the traditional encoding mechanism of DE and our simplified encoding mechanism

Crossover, mutation, and selection operators

Before the iteration of the algorithm, we first randomly generate a number of new individuals that satisfy the conditions according to the individual design rules to form the initial population. Next, the population is iteratively updated using crossover, mutation, and selection operators to search for the optimal solution. The evolutionary process of DE is very similar to that of GA, containing the basic operations of mutation, crossover, and selection, but the specific definitions of these operations are different from those of the genetic algorithm. For the variation operation, the mutation operator generates a mutation vector vi for each target vector xi (i.e., individual) according to Eq. (4), where NP denotes the number of individuals in the population and r1, r2 and r3 are [1, NP] randomly selected integers representing three randomly selected individuals from the population. f is the scale factor and is usually set to 0.8. xr2-xr3 is the difference vector.

$$\stackrel{-}{{v}_{i}}\,=\,\stackrel{-}{{x}_{r1}}+F*\left(\stackrel{-}{{x}_{r1}}*\stackrel{-}{{x}_{r2}}\right), i=1, 2, \dots ,NP$$
(4)

For the crossover operation, a new vector ui is generated using the binomial crossover operator based on Eq. (5) for each pair of target vector xi and variant vector vi, where randj is a random number in [0, 1] following uniform distribution, jrand is a randomly chosen integer between 1 and D, and CR is a crossover control parameter and its range is in [0,1].

$$ u_{{i,j}} = \left\{ {\begin{array}{*{20}c} {v_{{i.j}} ,\quad if{\mkern 1mu} rand_{j} } & {lt;CR\;or\;j = j_{{rand}} } \\ {x_{{i,j}} ,\quad otherwise} & {} \\ \end{array} ,\quad i = 1,2, \ldots ,NP,\;j = 1,2, \ldots ,D} \right. $$
(5)

For the selection operation, we will compare the target vector xi with the new vector ui based on the fitness objective function f. The vector with a higher fitness value is selected for the next iteration.

$${u}_{i,j}=\left\{\begin{array}{c}{u}_{i.j}, if \quad f\left({u}_{i}\right)\ge f\left({x}_{i}\right) \\ {x}_{i,j}, otherwise\end{array}, i=1, 2, \dots ,NP \right.$$
(6)

After a series of mutation, crossover, and selection operations, the population produces new individuals. This process is repeated until the maximum number of iterations is satisfied and the algorithm stops.

DE-based ensemble

In the multi-algorithm regression module, we can obtain the predicted values for each test sample in each base regression model, and these predicted values are called response values. Before the algorithm iteration, a number of individuals are first randomly initialized to form a population. Each individual in the population represents a certain regression model weight assignment, so the whole population corresponds to a model weight assignment scheme. During the iteration of the algorithm, the population is updated after evolutionary operations such as crossover, mutation, and selection, resulting in new weight assignment schemes. TThe weight value αm of model m can be obtained from the weight assignment scheme formed by DE. The results of each base regression model can be integrated. Combining the response value fm and weight value αm of model m, the final surface roughness prediction value of a certain sample can be calculated according to Eq. (7). Then, the error between the predicted and actual values of the sample is used as the fitness function of DE to guide the DE algorithm for weight optimization of the integration framework. The detailed workflow of the ensemble calculation based on the DE algorithm is shown in Fig. 4. The integration is determined as shown in Eq. (7), where \(\widehat{\text{y}}\) denotes the predicted surface roughness; fm is the response value of each regression model; αm is the weight of each model, and M is the number of models.

$$\widehat{\text{y}}=\sum _{m=1}^{M}{\alpha }_{m}{f}_{m } subject\sum _{m=1}^{M}{\alpha }_{m}=1$$
(7)
Fig. 4
figure 4

Flow chart of DE-ensemble module

Interpretability analysis

Although machine learning algorithms such as neural networks have been successfully applied in various fields, it is a black-box that prevents end-users from understanding why such prediction results are made by the model. As a result, it is difficult for data-driven machine learning models to be deployed and applied in many application fields, such as biomedicine and autonomous driving. To fully understand the decision basis of some models, the concept of interpretable machine learning has been proposed and become one of the hottest research areas in AI. Currently, the most popular approach in interpretating machine learning is interpretable analyses by looking at feature importance and feature relevance.

Therefore, we leverage interpretability analysis methods on the original features and the newly constructed features to uncover the insightful information contained in the original features and validate the usefulness of newly constructed features. By synthesizing the results of the analysis, we tried to identify the main processing parameters that affect the surface roughness work.

Interpretability analysis based on impurity reduction concept

The concept of impurity is derived from decision trees, where a lower impurity reduction indicates a more reasonable division of the corresponding branches in a decision tree. The Information Entropy and Gini Index are two common impurity indicators, which are often used to measure the effectiveness of feature selection. In the GP-based EF algorithm, it first randomly constructs a number of features, then calculates their importance scores based on impurity reduction. Then, it ranks the constructed features according to their importance scores. After that, the model constructs new features from these important features by optimizing the loss function so as to supervise the algorithm to search for high quality features. Ultimately, the constructed features can effectively improve the generalization ability of the model. Based on the concept of impurity reduction, we rank the constructed features according to their importance. At the end, by comparing the difference between the constructed features and the original features, we can explain the effect of the processing parameters corresponding to the feature on the surface roughness.

Interpretability analysis based on Shapley theory

Shapley was proposed by Shapley (1953) for solving the problem of contribution and benefit distribution in cooperative games. The concept is derived from game theory, where several gamblers cooperate to play a game, and how the final benefit is distributed is determined by the contribution of each gambler in the game. Now, borrowing this problem to machine learning, each feature is equivalent to a gambler, and the coefficient of the feature (under the linear model) is equivalent to the contribution, so the coefficient multiplied by the average value is the average contribution. For a single instance xm, the contribution of the -th feature to the prediction ϕij is the contribution of xij minus the average value, which in the linear case is βjE(X).

Shapley additive explanation (SHAP) is a Python package that can interpret the output of any machine learning model. The idea of SHAP is to calculate the marginal contribution of features to the model output and then interpret the black-box model at both global and local levels. SHAP constructs a linear model based on the best Shapley value in game theory. For each sample, the machine learning model gives a prediction value and SHAP considers all features as “contributors” and the Shapley value is the value assigned to each feature in that sample. SHAP represents the Shapley values as an additive feature imputation method that expresses the predicted value of the model as the sum of the imputed values of each input feature. The SHAP method is a typical post-hoc model interpretation method, which is mainly used to provide a visual interpretation of the samples and features in the model.

The Shapley theory differs from the impurity reduction theory. In the node division process of decision tree, based on the impurity reduction theory, the model will aim to reduce the impurity and find a suitable feature and feature value to make the result of this division can make the class attribution clearer. Therefore, the idea is often applied to the construction of new features. And Shapley theory is to study the contribution of current feature values to the model when the features and the model are known.

Thus, in addition to using interpretable analysis based on impurity reduction, we use the SHAP method to visualize the original features as well as the newly constructed features and explain the impact of the new features constructed based on the EF algorithm on the model performance by comparing the differences in Shapley values.

Validation experiments

Validation experiment of abrasive water jet polishing on 3D-printed CoCr alloy

CoCr alloy has been widely used as artificial implant owing to its excellent corrosion resistance and superior mechanical properties. For now, 3D printing has become more and more popular in the manufacturing of customized CoCr implants. However, the surface roughness after printing is very rough. Post-processing is always required before use. In this study, AWJP was used to polish the 3D-printed CoCr alloys, which has been widely used in polishing many other complicated surfaces made of different materials (Fähnle et al, 1998; Wang et al., 2022; Wang et al., 2017a, b). Before this study, a lot of trial and error had to be performed to obtain feasible polishing parameters that would allow the machined surface roughness to meet specific requirements. Therefore, there is an urgent need to develop surface roughness prediction models to save time and cost for practical mass production.

The polishing equipment used in this study is ZEEKO IRP200 polishing machine as shown in Fig. 5. To explore the relationship between parameters of AWJP and surface roughness, this study conducted AWJP experiments based on 3D-printed components CoCr. The dimension of the sample was 42 mm × 40.5 mm × 10 mm, and the polishing region for each test was 6 × 3 mm. The polishing slurry used in this study was 1000# aluminum oxide (FUJIMI Inc., Japan) with the weight% of 10%. Sapphire nozzle with the diameter of 1 mm was used and the polishing slurry was impinged vertically to the target surface. Key polishing parameters such as feed rate, fluid pressure, stand-off distance, and step distance were considered and investigated in this study, while others were kept constant. After polishing, the surface roughness in surface mean height (Sa) was measured by ZYGO Nexview white light interferometer. Three different positions were measured randomly to obtain the final experimental results (Avg_Sa). 40 datasets were obtained under different polishing conditions including feed rate, pressure, tool offset, and step distance. The detailed parameters are shown in Table 2, and the experimental results are listed in Table 3.

Fig. 5
figure 5

Experimental set-up for abrasive water jet polishing

Table 2 Parameters setting of AWJP experiments
Table 3 AWJP experiments data

Parameter setting

We randomly divided the above experimental data into training and test sets in appropriate proportions of 3:1 and normalized the data according to Eq. (1). The normalized training data are fed into the EF algorithm, and four folds are used for automatic feature construction to fit each decision tree model in the EF and the other is used as a validation set to verify the validity of the automatic feature construction. After that, we used the grid search method to tune the parameters based on the new training data to obtain the best hyperparameters for each model. Then, we performed regression prediction on the test dataset based on the best training models. Finally, the prediction results of each regression model were used as the input to the ensemble learning framework based on DE, which calculates the integrated prediction values by weighting the input according to the best weight assignment scheme obtained from the search. We randomly divided the data five times and repeated the above process, and took the average of the five prediction errors as the final prediction result. The error between the predicted and true values was used as the fitness calculation function of DE to guide the DE algorithm to search of a better solution, and we used common error evaluation metrics such as mean square error (MSE) and mean absolute error (MAE) to calculate the error, respectively, whose detailed calculation is given by Eqs. (8) and (9). And the units of both MAE and MSE are “µm”.

In this paper, the maximum number of iterations of DE was 150,000 and the number of populations was 6. In addition, to avoid the fluctuating effect of the randomness of the data on the algorithm results, we repeated the experiments five times and took the average of the five experiments as the final experimental results.

$$\text{M}\text{A}\text{E}=\frac{1}{m}\sum _{i=1}^{m}\left|{\widehat{y}}_{i}-{y}_{i}\right|$$
(8)
$$\text{M}\text{S}\text{E}=\frac{1}{m}\sum _{i=1}^{m}{\left({\widehat{y}}_{i}-{y}_{i}\right)}^{2}$$
(9)

Results and discussion

Performance comparison before and after feature construction based on the EF algorithm

As shown in Figs. 6 and 7, we compared the changes in model performance before and after applying the EF algorithm for feature construction (using MAE and MSE as a measure of model performance). The blue and red bars in the figure indicate the results of the five experiments of the ELDEA before and after feature construction, respectively. The dark blue dashed line indicates the average error of the five experiments of the ELDEA before feature construction, and the dark red dotted line indicates the average error of the five experiments of the ELDEA after feature construction, which visually demonstrates the effectiveness of feature construction using the EF algorithm. Specifically, its MAE decreases from 0.2425 to 0.2302, which is about 5.4%, and its MSE also decreases from 0.1611 to 0.1137, which is about 41.6%. In addition, the experimental results based on different data distributions demonstrate the higher robustness of the ELDEA method.

Fig. 6
figure 6

Performance comparison between before and after feature construction (MAE)

Fig. 7
figure 7

Performance comparison between before and after feature construction (MSE)

To further verify the effectiveness of the EF algorithm, we also compared the performance changes of each base regression model before and after applying the EF algorithm for feature construction, as shown in Fig. 8a and b. It can be found that applying the newly constructed features to most of the base regression models achieves lower error prediction results, which reveals the effectiveness of our proposed feature construction method for improving the accuracy of surface roughness prediction.

For the reason that the prediction performance of the method based on EF feature construction is better than that of the original method without feature construction, we had the following analysis. As an evolutionary algorithm, the EF algorithm performs automatic feature construction with the goal of minimizing the prediction error, i.e., it uses the prediction error as a fitness function to guide the direction of evolution. The algorithm retains the features that can reduce the prediction error and eliminates the features that are not helpful for prediction in the process of iterative evolution, so that the new features obtained in the end can effectively improve the prediction performance compared with the original features.

Fig. 8
figure 8

Performance comparison of each base regression algorithm before and after feature construction

Comparison of ELDEA and classical machine learning

We compared the prediction performance of the ELDEA method with six classical regression algorithms on the AWJP experimental data and used MAE and MSE as evaluation metrics to measure the prediction performance of the methods to verify the effectiveness of our proposed ensemble method. As shown in Table 4, the ELDEA method has the best prediction results compared with the six classical regression methods, and its results are all better than those of the underlying regression model to varying degrees. Specifically, through the weight optimization of the DE algorithm, the mean values of MSE and MAE of the ELDEA compared to the six classical regression algorithms are reduced by 0.3164 and 0.1682, respectively. In addition, we can also find that the optimal values of MSE and MAE of the ELDEA compared to the six basic regression models are also reduced by about 49.0% and 15.7%. This shows that the integration effect of the ELDEA method is significant.

Using the MAE experiment on the AWJP dataset as an example, the ELDEA gives the best results, the ETR algorithm gives the second best results, and the GBR and LASSO algorithms give the worst results. The integration result of the ELDEA indicates that the optimal weights of the six regression models determined by the ELDEA, namely RF, LASSO, XGBoost, ETR, GBR, and SVR, are 0.0451, 0.0056, 0.1742, 0.5432, 0.0061, and 0.2257. The ELDEA finds that the ETR algorithm performed better on this experimental data, so the weights of the ETR base regression model were elevated during the weight assignment process. Relatively, the GBR and LASSO algorithms had the lowest final weight assignments due to their poor performance.

The possible reason for that the ETR model weight is high is: the features used by the ELDEA algorithm are constructed by the EF algorithm, which constructs features by integrating multiple decision trees, so the constructed features may be more suitable for integrated decision tree-based models such as ETR and RF. ETR is a decision-tree-based regression algorithm that uses multiple decision trees for prediction based on randomly selected feature subsets. Compared to other decision tree-based algorithms, ETR uses more randomness in constructing decision trees to avoid overfitting. The algorithm is more stochastic. For a particular decision tree, the prediction results with it are often inaccurate because its best bifurcation attributes are chosen randomly, but multiple decision trees are combined to achieve good prediction results. And since the sample size of the current experimental dataset is inherently small, this randomness brings more possibilities for the results, which may also be the reason why ETR outperforms other algorithms.

The above results fully demonstrate the effectiveness of the Integration framework of the ELDEA, i.e., our algorithm is not only able to select the base regression model with better performance and give it a larger weight assignment ratio, but also to obtain a better prediction performance than the optimal base model through the integration framework.

Table 4 Comparison of MSE & MAE of different methods in AWJP. (Unit: µm)

Comparison of ELDEA and neural network algorithms

To further verify the performance of the proposed algorithm, we compared the prediction performance of the ELDEA method with three neural network algorithms, namely Artificial Neural Network (ANN), Convolutional Neural Network (CNN), and Recurrent Neural Network (RNN) on the AWJP experimental data and still used MAE and MSE as evaluation metrics to measure the prediction performance of the methods, the results are shown in Fig. 9. For the neural network algorithms, experiments are conducted using the hold-out method based on the raw data and the feature reconstruction data, respectively. Specifically, we divided the data into training and testing sets in a 3:1 ratio, and trained and predicted based on three neural network models. To ensure the fairness of the comparison, we still used different random seeds for five experiments, and took the average of the five experiments’ results as the final prediction result. For ANN, the activation function uses ReLU, the optimization algorithm uses Adam, the regularization term coefficient is 0.01, and the maximum number of training iterations is 200. For CNN, the activation function uses ReLU, the optimization algorithm uses Adam, the loss function is absolute error, the network structure contains a fully connected layer, convolutional layer, and pooling layer, the window size of the convolutional layer is 8, and the epoch of training process is 200. For the RNN, the activation function is ReLU, the optimization algorithm is Adam, the loss function is absolute error, the network structure contains a fully connected layer, a GRU layer, and a pooling layer, and the units of the GRU layer are set to 16, and the epoch of the training process is 200.

Fig. 9
figure 9

erformance comparison with neural network algorithms before and after feature construction

From the results in Fig. 9a, it can be found that the MAE of the ELDEA algorithm is the smallest on both the data before and after feature construction compared to the other three neural network algorithms, thus it can be shown that the ensemble model can obtain better performance than the neural network algorithm on this type of dataset.

In addition, to further verify the performance advantage of the proposed algorithm, we also conducted a comparative experiment by using MSE as an evaluation metric. The results are shown in Fig. 9b, and it can be found that the performance of the proposed algorithm is still significantly due to the neural network algorithm. We analyzed the possible reasons for this phenomenon as follows: due to the small amount of current experimental data, without the support of a large amount of training data, the neural network algorithms are prone to overfitting, and thus their prediction performance is not necessarily better than that of the ensemble learning methods.

Comparison of ELDEA and other algorithms

Similar to our proposed algorithm, the ELGA algorithm (Wang et al., 2022) is an ensemble framework based on evolutionary algorithms for surface roughness prediction, so we compared the performance differences between ELDEA and ELGA algorithms on the MJP experimental dataset of 3D-printed 316 L stainless steel. In the MJP experiments, 3D-printed 316 L stainless steel parts were polished by a ZEEKO IRP200 machine with different process parameters. After polishing, the surface roughness Sa was measured three times and averaged to obtain the final Sa value. We obtained 43 experimental data items with different polishing parameters, including feed rate (f), fluid pressure (P), tool offset (TO), step distance (d), and surface roughness (Saf). The detailed experimental data are shown in Table 5.

Table 5 MJP experiment data

We referred to the experimental setup of Wang et al. (2022a, b) in their paper and divided the dataset randomly into training and testing sets in the ratio of 9:1, and used the ratio of MAE as the evaluation index, as shown in Eq. (10). Finally, ten experiments were run and averaged as the final result. Smaller ratio of MAE indicates higher prediction accuracy of the prediction method. In addition, we also compared with other two ensemble learning algorithms, namely Averaging and Stacking algorithms, and the results are shown in Table 6. It can be found that the prediction effect of the ELDEA method is significantly better than other three ensemble methods.

$$\text{T}\text{h}\text{e}\; \text{r}\text{a}\text{t}\text{i}\text{o}\,\text{of}\; \text{M}\text{A}\text{E}=\frac{1}{m}\sum _{i=1}^{m}\left(\left|{\widehat{y}}_{i}-{y}_{i}\right|/{\widehat{y}}_{i}\right)$$
(10)
Table 6 Comparison of the ratio of MAE among different ensemble strategies

Time complexity analysis of ELDEA

In this section, we examined the time complexity of ELDEA starting from three parts: the EF-based automatic feature construction module, the multi-algorithm regression module, and the DE-based ensemble module, since they are the most time-consuming parts of the whole algorithm. For the EF-based automatic feature construction module, we assumed that the time complexity of the model training process is O(1), and then the overall time complexity of this module is O(G1|P1|), where G1 and |P1| represents the maximum number of iterations and the number of populations of the EF algorithm, respectively. For the multi-algorithm regression module, hyperparameter tuning is the most time-consuming part of the module. We assumed that the time complexity of the hyperparameter tuning process of a model is O(m), then the overall time complexity of the module is n*O(m), and n is the number of base models, which is a constant. Therefore, the time complexity of the module can be simplified to O(m). For the DE-based ensemble module, similar to the EF algorithm, as an evolutionary algorithm, the time complexity of the module is O(G2|P2|), where G2 and |P2| represent the maximum number of iterations and the number of populations of the DE algorithm, respectively. In summary, the overall time complexity of the ELDEA algorithm is determined by max[O(G1|P1|), O(m), O(G2|P2|)].

All methods were implemented in Python and the program was run on a computer equipped with an i9-9750 H 2.60 GHz CPU and an NVIDIA GEFORCE RTX 3060 GPU. Table 7 shows the single run time and five average run times of the ELDEA algorithm. (Unit: seconds)

Table 7 Running time of ELDEA algorithm

Discussion of interpretability analysis results

Discussion of interpretability analysis based on impurity reduction concept

In this section, we ranked the constructed features based on the concept of impurity reduction and conduct interpretable analysis for exploring the feature importance. The importance ranking of the constructed features is shown in Fig. 10, containing the top 15 most important features among the newly constructed features, where X0, X1, X2, and X3 represent feed rate, fluid Pressure, tool offset, and step distance, respectively. It can be found that 13 of the first 15 important features are related to feed rate, which indicates that the machining parameter of feed rate has a significant effect on the surface roughness. We also find that only a very small number of the newly constructed features are related to tool offset, which indicates that this machining parameter is not the key factor for predicting surface roughness. Figure 10 shows that the most important feature is the product of feed rate and fluid pressure. It can be speculated that the joint action of the two parameters may have a greater effect on the surface roughness, and in the subsequent machining experiments, we can design more interactive experiments based on the two machining parameters for verification. Comparing the third important feature and the fourth one, we find that the main difference is that the fourth important feature adds the role of the product of feed rate parameter based on the third feature, and the former feature is more important than the latter. That indicates that, after adding feed date, the combined effect of the three processing parameters has a more significant effect on the surface roughness. This result demonstrates the importance of the processing parameter Feed Rate. In addition, the sum of two machining parameters, fluid pressure, and step distance, is noteworthy as an important one. In future, we can explore the interaction of these two machining parameters in conjunction with related domain knowledge for exploring new insights for advancing the polishing technologies.

Fig. 10
figure 10

Feature importance ranking based on impurity reduction concept

Discussion of interpretability analysis based on Shapley theory

In this section, we used the SHAP method based on Shapley theory for further analyzing the original features in an interpretable manner. For each feature, we selected the average of the absolute values of its Shapley values before and after feature construction as the feature importance measure, which is presented in the form of a bar chart in Fig. 11a and b. Figure 11a represent the four features extracted from the original data, which are four main processing parameters. Figure 11b represent the top five most important newly constructed features in order of feature importance. Comparing subplots (a) and (b), it can be found that more features with higher Shapley values are constructed by the EF algorithm. Compared with the original features, the newly constructed features have better diversity and are more robust, therefore improving the prediction performance. For subplot (a) in Fig. 11, we found that the feature importance ranking obtained based on the SHAP method is largely consistent with the results exhibited in Fig. 10, which further illustrates the reliability of both interpretation methods. In addition, slightly different from the results in Fig. 10, the results in subplot (b) of Fig. 11 indicate that some of the newly constructed features do not contribute to the model based on the calculation method of Shapley’s theory. Different methods of calculating feature importance may have significantly different results for the same feature, and we need to explore both methods further and combine the nature of the features themselves to weigh which features play a major role in modeling.

Fig. 11
figure 11

Comparison of Shapley values of each feature before and after feature construction

For each sample, we plotted the Shapley values for each of its features, as shown in Fig. 12. The plot helps us to better understand the overall pattern from a global view and possibly identify predicted outliers. The figure contains all the samples. Each point represents a sample. The horizontal coordinates are the samples ranked by Shapley value. The vertical coordinates are the features ranked by importance, and the depth of the color indicates the magnitude of the feature value (high in red, low in blue). It is worth noting that the feature value here is different from the Shapley value and is the value taken by the feature itself.

From Fig. 12, it can be found that the feed rate is the most important. As the feed rate value increases, the corresponding Shapley value also increases (i.e., the color of the sample points gradually changes from blue to red as the x-axis moves). When the feed rate is too low, the contribution of this feature to the prediction model of surface roughness is reduced, and even has a negative effect on the prediction of surface roughness. Therefore, in the subsequent experiments, we can set a minimum threshold value for feed rate so that this processing parameter under condition that it is not lower than this threshold value. This method can help us reduce the number of experiments and thus improve experimental efficiency. In addition, fluid pressure and step distance are also the main influencing factors, and tool offset has little effect on the surface roughness, which is consistent with the analysis results in the previous section, thus further determining the main and secondary influencing factors of surface roughness. The graph also shows that Fluid Pressure and step distance are positively correlated with the prediction of surface roughness in a certain range, i.e. the larger the value the more significant the effect on surface roughness. In addition, we can also find some outliers in the plot, which can help us to detect outliers. We will analyze and explore these outliers in the subsequent study.

Fig. 12
figure 12

Distribution of Shapley values for each sample

Conclusions and future work

In this paper, a novel surface roughness prediction method based on DE algorithm and ensemble learning is proposed and applied to surface roughness prediction for abrasive water jet polishing (AWJP). Our method uses six machine learning models as the base models that form the ensemble framework, namely RF, LASSO, XGBoost, ETR, GBR, and SVR, which are responsible for modeling the relationship between machining parameters and surface roughness. First, a feature reconstruction method based on the EF algorithm is used to reconstruct the features of the experimental data, and this data is used for the training and prediction of the base regression models. Then, the DE algorithm is used to optimize the weight assignment of each model. For this optimization problem, a simplified encoding mechanism is proposed in this paper for the individual design of the DE algorithm, which can further improve the search efficiency of the DE algorithm. We validated the proposed method on the AWJP dataset and compare the performance of the method with other methods. The results showed that the ELDEA has significant advantages over classical machine learning algorithms and other ensemble learning algorithms. On the experimental dataset of AWJP, the ELDEA algorithm reduces its MSE and MAE by about 33–84% and 14–64%, respectively, compared to the classical machine learning algorithms. For the MJP experiments, the ELDEA approach reduces the ratio of MAE by about 37%, 35%, and 16%, respectively, compared to those obtained by other three ensemble learning methods. Finally, the data features are analyzed and discussed using an interpretable approach to identify the main processing parameters that affect surface roughness, and the results of the interpretable analysis provide a theoretical reference for future experiments and research.

In this paper, the ELDEA can receive different inputs through an ensemble model and is expected to be transferable to surface roughness prediction for other machining methods such as turning and milling. In addition, the interpretable methods used in the current study all perform interpretable analysis from the level of features, by combining data and models. The research on the self-explanation of the model itself is yet to be further explored. In the future, we will aim to combine the interpretability study of the model with the interpretability study of the features and use a more comprehensive interpretability approach for analysis and discussion. Therefore, future work will be devoted to investigating and improving the ELDEA method so that it is expected to become a general method for surface roughness prediction with more comprehensive interpretability.