Introduction

HPC is a widely utilized cement-based composite material1, consisting of water, cement, coarse aggregate, and additional mineral admixtures and chemical admixtures. It is produced through mixing, vibrating, forming, curing and solidification in specific proportions based on engineering experience or test formulas. Its widespread use is attributed to its excellent plasticity, fire resistance, corrosion resistance and utilization of local materials. The concept of HPC was first proposed by the National Institute of Technology and the American Concrete Institute in 1950. In contrast to ordinary concrete, HPC offers superior durability, workability and volume stability2.

At the same time, the mapping relationship between each component of HPC and its compressive strength becomes more complicated and multi-dimensional. Traditional modeling methods are based on scientists' descriptions of objective things and often need to oversimplify practical problems. This will cause the model to fail to capture enough mapping relationships and the error will be too large to achieve the desired effect. In addition, traditional modeling methods are faced with many difficulties in data acquisition, processing and final verification. Therefore, in the face of more dimensional mapping relationship between components of HPC and compressive strength, even on the basis of existing empirical regression formulas3,4,5 and a large number of test sample data, it is difficult to derive a universal equation that can reach the accuracy of compressive strength measured in laboratory environment6,7,8,9.

With the continuous development of artificial intelligence, the powerful data identification and nonlinear fitting capabilities of ML make it have certain advantages in solving complex engineering problems10. Among them, a lot of research has been done on the prediction of the basic performance of concrete. With the development of concrete, ML has become a new tool for its performance research, and the results are fruitful.

Adding various fibers, nanoparticles or other substances to concrete can improve its mechanical properties11 to form a new type of concrete, but there are few studies on its performance prediction. Chithra et al.12 studied the use of artificial neural networks to accurately predict the compressive strength of concrete containing nano-silica particles and copper slag. Ashrafian et al.13 used a heuristic regression method to predict the strength and ultrasonic pulse velocity of fiber-reinforced concrete. 3D printed concrete has attracted wide attention in recent years. Its short construction time, low labor cost and high design freedom make it suitable for various fields14. However, its anisotropy makes its compressive strength unreliable in predicting. Then, the beetle antenna search technology proposed by15 can automatically search the optimal hyperparameters, thus realizing the accurate prediction of the compressive strength of 3D printed concrete under steam curing conditions. Long et al.16 established a long- and short-term memory network model to further improve the accuracy of predicting the dynamic compressive strength of concrete-like materials at high strain rates. The above research provides certain ideas for the performance prediction of other types of concrete and has practical guiding significance. More researches on ML are shown in Table 1 below.

Table 1 A synopsis of models from earlier research.

In short, ML as a sign of the age of intelligence, it opens up countless possibilities in civil engineering. It not only improves production efficiency, but also creates unprecedented changes in many aspects such as structural quality control and structural safety monitoring. Especially in the era of "double carbon", machine learning technology plays a huge role in the development and performance improvement of building materials. Therefore, to enable civil engineering, it will be able to help the early achievement of the "double carbon" goal, and help realize the diversified and intelligent development of civil engineering.

In this paper, the single model of BPNN and SVM is combined with GA optimization algorithm to make up for the lack of parallel computing capability of SVM, easy to fall into local optimal solution and easy to disappear gradient during BPNN training. At the same time, more input variables are introduced to further improve the prediction accuracy and reliability of the prediction model. In addition, the interpretability of the model was verified by GRA and Shapley analysis, which ensured the reliability of the model interpretation. It provides guidance for subsequent researchers to evaluate the compressive strength of HPC.

Description and analysis of database

For machine learning techniques, database compilation is a fundamental and critical step. In order to make the research results universal, the composition of HPC is studied in this paper. Experimental data samples were collected from 9 literatures, and outlier data sets with water-binder ratio less than 0.1 or greater than 0.5 were eliminated, so as to establish a 28-day HPC cube compressive strength database. Table 2 lists detailed statistics of the data used in this article.

Table 2 HPC compressive strength test parameters .

The distribution of input variables greatly affects the degree of generalization of the constructed model36. The cumulative frequency range histogram in Fig. 1 illustrates how input factors affect the compressive strength of HPC. From the graph, it can be seen that the frequency of the vector is appropriately high, and the range of input variables is wide. Cement content ranges from 220 to 708 (kg/m3) while most values are between 300 and 650 (kg/m3). Compared with cement content, the water content range is relatively narrow, and most are between 160 and 185 (kg/m3). Regarding the content of silica fume and fly ash, there are many cases where the content is zero, and the dispersed content of the two is between 0  and 200 (kg/m3) and 0–275 (kg/m3), respectively, and the change is relatively large. The water-binder ratio is another important vector influencing the compressive strength of HPC37,38, and its value is mainly concentrated around 0.3. In this research, the compressive strength of HPC ranges from 38 to 123 MPa and is concentrate between 40 and 80 MPa.

Figure 1
figure 1figure 1

Statistical distributions of the input/output variables: (a)–(i).

Figure 2. shows the multiple-correlation matrix. Different colors represent different correlation values. The horizontal and vertical cross factors have a positive association when the value is positive, and a negative correlation when it is negative. Among the 8 input variables selected, the correlation between superplasticizer and silica fume content was the strongest (R = 0.84), which is consistent with previous research. It is commonly known that the strength and mechanical characteristics of HPC can be considerably impacted by the dosage of silica fume and superplasticizer37,38. Regarding the relation between input and output variables, superplasticizer and compressive strength have the largest positive association, with silica fume and cement following closely behind. In general, there is little difference in the positive and negative relevance found between the input and output variables. Consequently, to guarantee the model's accuracy, all eight input variables were employed.

Figure 2
figure 2

Multiple correlations of input variables.

Machine learning approaches

BPNN

ANN is a calculation model based on simulating the connection and excitation suppression of neurons in the human brain. Typically, its structure is composed of three layers of neurons: the input layer, hidden layer, and output layer. Neurons in the input layer are responsible for receiving information from the surrounding environment, and neurons in hidden layer and output layer are in charge of carrying out linear and nonlinear approximations of the system under investigation. A single microprocessor will add up the weighted values received from the input layer neurons during the linear phase of ANN. The function of activation is used to the sum during the nonlinear phase, and the outcome is sent as the microprocessor's output. In an ANN, weights are used to connect each layer's neurons to one another sequentially. Figure 3 illustrates the ANN's structural layout. The total amount of neurons in the input layer and output layer, respectively, is represented by the amount of features and tags in data set. It is possible to ascertain the amount of neurons in the hidden layer using an empirical formula method or a trial-and-error approach.

Figure 3
figure 3

Architecture of ANN.

Support vector machine regression (SVM)

Cortes and Vapnik39 present SVM as a solution for categorization issues. In order to solve regression and prediction issues, SVM is a significant branch that SVM extended. These kinds of regression and prediction models usually compute model losses using the difference value between the model's expected and actual output values. The loss is only zero when the predicted value of the model matches the true value. As shown in Fig. 4, the low dimensional feature space of the sample can be transformed into a high-dimensional feature space using Gaussian radial basis functions to enhance the performance of the regression model and better match the collected data. The output of the SVM is expressed as a linear function, as shown below:

$$ f(x) = \left\langle {\omega \varphi (x)} \right\rangle + b $$
(1)

where 〈\(\cdot \)〉represents point function; \(\omega \) is the minimal value obtained from the following equation.

Figure 4
figure 4

Support vector machine algorithm structure.

\(\varphi (x)\) is a mapping of input features to higher-dimensional feature spaces; b is one of the function's parameters vectors.

$$ \min imise\left[ {\frac{1}{2}\left\| \omega \right\|^{2} + C\sum\nolimits_{j - i}^{n} {\left( {\xi_{j} + \xi_{j}^{ * } } \right)} } \right] $$
(2)
$$ subjectto\{ y_{i} - \omega \varphi \left( {x_{i} } \right) - b \le \varepsilon + \xi_{j} \omega \varphi \left( {x_{i} } \right) + b - y_{j} \le \varepsilon + \xi_{j}^{*} \xi_{j} ,\xi_{j}^{*} \ge 0 $$

where C for the regularization parameter; n represents the quantity of samples; \({\xi }_{j}\),\({\xi }_{j}^{*}\) are slack variables; \(\varepsilon \) is insensitive loss function; \({y}_{i}\) is experimental value.

Principle of genetic algorithm (GA)

The natural biological evolution system's computer numerical simulation technology is where GA originated. It is a technique for competition based on random global search and optimization that was created by modeling how organisms naturally evolve40. It is utilized for hyperparameter modification in various machine learning algorithms and is based on Mendelian genetics and Darwin's theory of evolution. In an iterative search process, it can automatically search, gather information about the search space, and adaptively regulate the search to find the best solution. Consequently, it can be considered a worldwide, efficient, parallel search heuristic method. Because the conventional BPNN has a tendency to enter a local minimum, the SVM hyperparameter adjustment is crucial. The baseline weights and biases of the BPNN are adjusted in this work using the random global search and optimization capabilities of the GA. Additionally, the global search and optimization of the SVM's hyperparameters is employed to improve the model's precision. Figure 5 illustrates how BPNN and SVM are optimized using GA.

Figure 5
figure 5

GA optimization flowchart of BPNN and SVM: (a) and (b).

Assessment of model function

Three widely used statistical indicators—Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Squared Correlation Coefficient (R2)—were utilized to evaluate the built prediction model in order to determine its performance28,41,42. While both MAE and RMSE provide insight on the size of the participation error43, the RMSE data are mostly used to choose the optimal prediction model. With lower MAE and RMSE values, the established prediction model performs better. R2 is a statistical metric that quantifies the extent to which the independent variable can account for variations in the dependent variable. Its values fall between 0 and 1. A regression line is said to be closer to each sample test point if the R2 value is endlessly close to 1. This suggests that the larger the ratio of the regression's sum of squares to the population's sum of squares. Regression fitting works best when changes in independent variable x can completely account for changes in the value of the dependent variable, y42. The formulas of the three indicators are as follows:

$$ MAE = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {\widehat{{y_{i} }} - y_{i} } \right|} $$
(3)
$$ RMSE = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {\widehat{{y_{i} }} - y_{i} } \right)}^{2} } $$
(4)
$$ R^{2} = 1 - \frac{{\sum\limits_{i = 1}^{n} {\left( {\widehat{{y_{i} }} - y_{i} } \right)}^{2} }}{{\sum\limits_{i = 1}^{n} {\left( {y_{i} - \overline{y} } \right)^{2} } }} $$
(5)

where \(\widehat{{y}_{i}}\)\({y}_{i}\) are predicted value and tested value respectively; ‾y is the mean of all tested values; the test set's overall amount of samples is denoted by n.

Research framework process

The frame structure of machine learning model for studying compressive strength of HPC in this paper is shown in Fig. 6. There are generally four steps:

Figure 6
figure 6

HPC compressive strength machine learning framework.

Step 1: Database construction.

There are 454 samples overall in the built database, including PC, W, SP, SF, FA, AE, W/B, S/A, and HPC compressive strengths. The database was subjected to multi-correlation matrix analysis to ascertain the level of relationship among the input variables. Then, utilizing 70% of database for training and 30% for testing, the machine learning model was put to use.

Step 2: Training of the machine learning model.

The training set was applied in this step to train two independent machine learning models of the suggested BPNN and SVM. Furthermore, GA was employed to optimize the model's weights, thresholds, and hyperparameters, producing two coupled machine learning models: GA-BPNN and GA-SVM.

Step 3: Defining the optimal machine learning model.

The test dataset is applied to validate the machine learning model. Next, using statistical markers (MAE, RMSE, R2), the optimal model is chosen.

Step 4: Parameter analysis.

GRA and shapley analysis were used to test the characteristics and significance of input variables, and the influence of the number and value of input variables on the model was investigated. Finally, the actual experimental results are analyzed and compared with the predicted results generated by the six combinations.

The results of the research and discussion

Performance assessment of models

The quantity of neurons in hidden layer of BPNN influences its property, whereas the c and g parameters have an impact on SVM performance. BPNN may approximate any nonlinear model of topological structure with three layers44,45 As a result, this paper only used one hidden layer. If there are not enough neurons in the network, it cannot accurately depict the connection between input and output. On the other hand, an excess of neurons may result in an extended running period and the over-fitting phenomenon. Generally, the empirical formula (6) is used to determine the quantity of neurons. The results of comparing and selecting the neuron count of hidden layers on network performance (MSE) are shown in the Fig. 7. The parameters c and g were optimized through cross validation and genetic algorithm, and the results are shown in Fig. 8.

Figure 7
figure 7

Best hidden layer neurons nodes.

Figure 8
figure 8

c, g parameter optimization: (a) and (b).

Table 3 shows the course of trial and error that was utilized to determine the optimal hyperparameters for the machine learning. While certain parameters were left as default, each ideal parameter utilized in the model was fine-tuned using a combination of expertise and trial-and-error to get its best value. Next, each model's forecast precision was computed.

$$ L = \sqrt {n + m} + a $$
(6)
Table 3 Optimal hyperparameters for machine learning models.

The variables L, n, and m represent the quantity of neutrons in hidden layer, output layer, and input layer, respectively, of a BPNN. The constant, a, can be any of the following values. Figure 7 illustrates how the quantity of neutrons in hidden layer affects the network's MSE. In this research, the ANN was run multiple times to get the train and test sets' MSE values. Their average was then computed to determine the ideal number of hidden layer nodes.

The comparison of the four developed machine learning models is displayed in Fig. 9. The training set's R2 value is consistently less than the test set's, even for all employed algorithms, albeit the difference is negligible. The model that employs BPNN has the lowest accuracy since it has the lowest R2 value and bigger RMSE and MAE values than the SVM model. Overall, all algorithm of this study exhibit strong performance, with accuracy in predicting surpassing that of a single model. But the GA-SVM combination fared better than the others, with the lowest MAE and RMSE values as well as the greatest R2 value.

Figure 9
figure 9

Performance of machine learning: (a)–(c).

The advantages of the evolutionary algorithm in stochastic global search and optimization are evident in the hybrid model's superior performance over the single model, as demonstrated by the comparative findings in Fig. 940. GA, for instance, is difficult to trap in a local minimum. Thus, GA can enhance the efficacy of a single predictive model by reducing prediction error; in this instance, a combination of models was applied to estimate the HPC property. For the train and test sets of the database, Fig. 10 presents a comparison and comparison of the actual and predictive values of compressive strength.

Figure 10
figure 10

Comparison of predictive and tested values of hybrid model: (a)–(d).

Figure 10 displays the goodness of fit of the tested and anticipated compressive strength of HPC. The figure's perfect fitting curve is represented by y = x, and the 5 MPa model prediction accuracy error boundary is indicated by the crossing line y = x ± 5. The training and test sets' compressive strengths, as predicted by the two coupling models, are demonstrated in terms of their accuracy and goodness of fit using the fitting curve and 5 MPa error boundary. The findings demonstrated that nearly all of the data points were situated close to the curve y = x. The test set's values of R2 were 0.9882 and 0.995, respectively, while the two model training sets' R2 values were R2 = 0.987 and R2 = 0.989. This shows that the test set's goodness of fit is higher than the training set's, and the model's predicted and tested values have a better goodness of fit. The error border shows that all of the test set's data points are inside the boundary, but only a small portion of the training set's data points for the two models are outside of it. GA-SVM's RMSE and MAE values are, respectively, 0.93437 MPa and 0.6726 MPa. According to the results, GA-SVM has the highest prediction precision in the hybrid model. This is primarily because genetic algorithms have the ability to search globally and optimize them, while SVM algorithms require fewer parameters and minimize duplicate learning.

Characteristic importance analysis

Although the GA-SVM mixed model in this study can accurately predict the compressive strength of HPC, it cannot test the influence of different input factors on the compressive strength of HPC. In order to study the importance of each input vector to the compressive strength of HPC in GA-SVM coupling model. In this paper, GRA and shapely analysis methods were used to combine the features to study the influence of the input features on the model performance42.

Theory of GRA

By measuring the correlation between a parent sequence (reference sequence) and a sub reference (comparison sequence) using the sequence curve that the current data set forms, GRA is able to perform quantitative analysis. The research object's data set serves as the reference sequence, and the pertinent variables that have an impact on the research object serve as the GRA comparison sequence. It is possible to ascertain the degree of correlation by comparing the two sequence curves. This method uses the degree of relevance—which is a quantification of the geometric form similarity—to reflect the degree of similarity between the geometric shapes of sequence curves. The largest impact on the system's reference sequence correlation can be identified using correlation calculations46,47,48. The primary idea is to look into the degree of connection between comparison sequences and reference sequence pairs using a simple model. Therefore, the best possible sorting of the comparison sequence is the aim of influence. It is assumed that the comparison and reference sequences are RS and CS, respectively.

$$ X_{0} = \, \left\{ {x_{0} \left( {1} \right),x_{0} \left( {2} \right) \, \ldots ,x_{0} \left( {\text{n}} \right)} \right\} \, \left( {{\text{RS}}} \right) $$
$$ X_{i} = \, \left\{ {x_{i} \left( {1} \right),x_{i} \left( {2} \right) \, \ldots ,x_{i} \left( {\text{n}} \right)} \right\} \, \left( {{\text{CS}}} \right) $$

i is the amount of CS, which is 8 in this paper.

The following are the stages involved in operating GRA:

  1. 1.

    Dimensionless method: Since the range and units of initial data collection vary, dimensionless processing is required. However, due to the existence of 0 information in the data of this study, the dimensionality cannot be treated by means of averaging. In this paper, the dimensionless data is processed in a standardized way.

$$ x_{i} (z){\prime} = \frac{{x_{i} (z) - \mu }}{\sigma } $$
(7)

where z = 1, 2…, n-1, n. \(\mu\) is the mean value of the data sample and \(\sigma\) is the standard deviation of the data sample.

  1. 2.

    To solve the gray relation coefficient:

$$ \zeta_{i} ({\text{z}}) = \frac{{\mathop {\min }\limits_{i} \mathop {\min }\limits_{z} \left| {x_{0} (z) - x_{i} (z)} \right| + \rho \cdot \mathop {\max }\limits_{i} \mathop {\max }\limits_{z} \left| {x_{0} (z) - x_{i} (z)} \right|}}{{\left| {x_{0} (z) - x_{i} (z)} \right| + \rho \cdot \mathop {\max }\limits_{i} \mathop {\max }\limits_{z} \left| {x_{0} (z) - x_{i} (z)} \right|}} $$
(8)

where \(\rho \in (\text{0,1})\) represents the resolution coefficient. The resolution will increase as the resolution coefficient decreases. The resolution coefficient is set as 0.2 in this paper.

  1. 3.

    To solve the degree of gray relation:

$$ \gamma_{i} = \frac{1}{n}\sum\limits_{z = 1}^{n} {\zeta_{i} (z)} $$
(9)

where z = 1, 2…, n-1, n.

  1. 4.

    Sorting by gray relationship degree: The measure of sequence proximity and the definition of the connection level order between the parent and subsequences are based on the level of gray relation solved.

GRA analysis enhances model interpretability

The standard compressive strength of HPC cube at 28 days of age is taken as reference sequence, and its 8 components are used as factors to establish a comparison sequence, and then the GRA model of HPC compressive strength is established by MATLAB. The model research results are shown in Fig. 11.

Figure 11
figure 11

Gray relational rank of input variables.

A correlation analysis of the GA-SVM prediction model's input variables and output values is presented in Fig. 11. A correlation degree more than 0.7 is typically considered significant, 0.5 to 0.7 is considered rather significant, and the remaining correlation degrees are considered insignificant. The research results show that the main factor affecting the compressive strength of HPC is W/B49,50,51. W/B too small will make the concrete easy to crack, not conducive to site construction operations. Conversely, excessive W/B will reduce the compressive strength. Therefore, appropriate W/B is crucial. Secondly, although SF, W, PC and AE have less influence on compressive strength than W/B, the importance value of the features still belongs to a fairly significant range. However, SP, S/A and FA belong to the non-significant range because their feature importance values are lower than 0.5. Although these characteristics have little influence on the strength, it is feasible to appropriately improve the compressive strength of HPC. For example, the addition of SP at an appropriate water-cement ratio can change the void structure inside the concrete and the final form of the hydration product. The mechanical properties, workability and durability of HPC are affected by S/A. When the concrete mixture falls within the tolerable range, it can achieve increased fluidity, maintain good cohesion and water retention, and thus improve strength. According to GRA theory, the closer the correlation value is to 1, the greater the impact of this feature on the compressive strength of HPC49. It can be concluded that all the characteristics have an effect on the compressive strength of HPC, among which W/B, SF and W are the most influential. Therefore, in order to improve the calculation speed of the model and reduce the workload of researchers, the less important elements can be removed when predicting the compressive strength of HPC..

Shapley analysis enhances model interpretability

Shapley analysis is an analytical method that enhances the interpretability of machine learning models. It uses the feature importance indicator to describe the importance of the input variable of the database to affect the output value. Shapley analysis was carried out in this paper, and the results of feature importance are shown in the figure. Among the 8 characteristics, W/B has the most significant influence on compressive strength, and its value can reach 5.794. In addition, the importance value of SF feature was 4.872, which also contributed more to compressive strength, followed by AE, PC, SP, S/A and FA. Compared with the analysis results of GRA, the two analysis results are roughly the same.

It is crucial to understand the specific impact of each feature in the database on the machine learning output target. Therefore, the summary of SHAP values in Fig. 12 shows the reader the impact of each feature on the output target. Each point in the diagram represents the SHAP value of the variable. The color of the dots from pink to blue indicates that the influence of each feature is from strong to weak. The x horizontal axis represents the SHAP value of each feature, and the y vertical axis displays each feature in descending order. In addition, the data point located in the negative region has a negative correlation effect on the output, and vice versa has a positive correlation effect. Therefore, a lower W/B will increase the compressive strength of HPC, and a lower SF content will decrease the compressive strength of HPC.

Figure 12
figure 12

Shapley feature importance analysis diagram:(a)–(b).

Effect of the amount of input variables

In this part, the performance of the GA-SVM model is studied by combining the results of the feature importance analysis of the database with the best prediction model GA-SVM. The compressive strength predicted by the numerical model is affected by the number of features and their significance42. Therefore, on the basis of the feature importance analysis results, this study gradually ignored some features, and finally only retained three input variables, AE, PC and SP, and established six numerical prediction models. The goal is to ignore certain variables and study how the performance of the model changes. Table 4 shows the performance evaluation indexes of 6 kinds of combination models, and the specific test values and predicted values are shown in Fig. 13.

Table 4 Combination performance of different input variables.
Figure 13
figure 13

Results of model prediction comparison: (1)–(6).

Figure 13 shows the predictions of six models created according to the importance of compressive strength for each input feature. Combined with Table 4, it can be seen that with the reduction of the number of input features, the evaluation indicators of the model show an overall decline trend, with the maximum decline rate of 11.97%. This is because model a has all the input features and model e has the fewest input features. Model f also has the least number of features, but its performance is much better than model e. This is because model f has higher feature importance AE, and model e has lower feature importance. The influence of AE feature importance on the model can be compared from models d and e. Therefore, the importance and number of input features together determine the predictive performance of the model.

In engineering practice, it is crucial to quickly find important information from large amounts of data. The main goal of GRA and shapley's analysis is to accomplish this critical step by using mathematical models to assess the importance of database features. Therefore, GRA and shapley analysis are used to reduce the dimensionality of the database by reducing the number of input features. This allows the model to run faster while maintaining accuracy and identifying important data.

Conclusion

In this paper, machine learning technology is used to study the nonlinear relationship between the HPC compressive strength and its influencing factors under large data samples, and the prediction model of its compressive strength is developed. The accuracy and performance of the model are evaluated based on a variety of statistical indexes. Finally, GRA analysis and Shapley analysis are used to verify and understand the relative importance and influence of each input feature on the output target.

  1. 1.

    The four machine learning models constructed in this study all have good prediction accuracy for the compressive strength of HPC. Moreover, the prediction accuracy and performance of GA-SVM model are higher than other models in both training and testing stages. The R2 value of the training phase is 0.9882, and the R2 value of the test phase is 0.995. Obviously, the accuracy and performance of the model testing phase are better than that of the training phase. This shows that the developed GA-SVM model has a strong data fitting ability and can fully capture the nonlinear relationship between features and output targets.

  2. 2.

    GRA and Shapley analysis together show that the importance of the characteristics of the HPC compressive strength database built in this study for the characteristics of compressive strength is sorted as follows: W/B, SF, W, AE, PC, SP, S/A, FA. This is of great significance for understanding the structure and characteristics of concrete.

  3. 3.

    The performance of the six models constructed by the combination of feature importance ranking shows that the performance of the model shows a decreasing trend with the decrease of the number of input features. Model (e) has the largest decline, with a decrease of up to 11.97% compared to model A. For one thing, model (e) has the fewest input features. On the other hand, AE has a high degree of contribution to model e. This confirms that the feature importance ranking analyzed in this paper is correct, and can provide help for the subsequent research on the optimization of the mix ratio of high performance concrete.

  4. 4.

    In the follow-up work, GA-SVM machine learning model can be used to study the durability of HPC. Then combined with NSGA-II multi-objective optimization algorithm, the mix ratio optimization design of high performance concrete is carried out. On the basis of the improvement of durability and compressive strength, the manufacturing cost of HPC concrete is reduced, which makes a certain contribution to the goal of "double carbon".