Prediction of HPC compressive strength based on machine learning

Jin, Libing; Duan, Jie; Jin, Yichen; Xue, Pengfei; Zhou, Pin

doi:10.1038/s41598-024-67850-9

Prediction of HPC compressive strength based on machine learning

Article
Open access
Published: 22 July 2024

Volume 14, article number 16776, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Prediction of HPC compressive strength based on machine learning

Download PDF

Libing Jin¹,
Jie Duan^1,3,
Yichen Jin²,
Pengfei Xue¹ &
…
Pin Zhou¹

477 Accesses
Explore all metrics

Abstract

There is a complex high-dimensional nonlinear mapping relationship between the compressive strength of High-Performance Concrete (HPC) and its components, which has great influence on the accurate prediction of compressive strength. In this paper, an efficient robust software calculation strategy combining BP Neural Network (BPNN), Support Vector Machine (SVM) and Genetic Algorithm (GA) is proposed for the prediction of compressive strength of HPC. 8 features were extracted from the previous literature, and a compressive strength database containing 454 sets of data was constructed. The model was trained and tested, and the performance of 4 Machine Learning (ML) models, namely BPNN, SVM, GA-BPNN and GA-SVM, was compared. The results show that the coupled model is superior to the single model. Moreover, because GA-SVM has better generalization ability and theoretical basis, its convergence speed and prediction accuracy are better than GA-BPNN. Then Grey Relational Analysis (GRA) and Shapley analysis were used to verify the interpretability of the GA-SVM model, which showed that the water-binder ratio had the most significant influence on the compressive strength. Finally, the combination of multiple input variables to evaluate the compressive strength supplemented this research, and again verified the significant influence of water-binder ratio, providing reference value for subsequent research.

Prediction of compressive strength of high-performance concrete (HPC) using machine learning algorithms

Article 14 December 2023

Employing the optimization algorithms with machine learning framework to estimate the compressive strength of ultra-high-performance concrete (UHPC)

Article 07 July 2023

A hybrid machine learning model to estimate self-compacting concrete compressive strength

Article 01 August 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Introduction

HPC is a widely utilized cement-based composite material¹, consisting of water, cement, coarse aggregate, and additional mineral admixtures and chemical admixtures. It is produced through mixing, vibrating, forming, curing and solidification in specific proportions based on engineering experience or test formulas. Its widespread use is attributed to its excellent plasticity, fire resistance, corrosion resistance and utilization of local materials. The concept of HPC was first proposed by the National Institute of Technology and the American Concrete Institute in 1950. In contrast to ordinary concrete, HPC offers superior durability, workability and volume stability².

At the same time, the mapping relationship between each component of HPC and its compressive strength becomes more complicated and multi-dimensional. Traditional modeling methods are based on scientists' descriptions of objective things and often need to oversimplify practical problems. This will cause the model to fail to capture enough mapping relationships and the error will be too large to achieve the desired effect. In addition, traditional modeling methods are faced with many difficulties in data acquisition, processing and final verification. Therefore, in the face of more dimensional mapping relationship between components of HPC and compressive strength, even on the basis of existing empirical regression formulas^3,4,5 and a large number of test sample data, it is difficult to derive a universal equation that can reach the accuracy of compressive strength measured in laboratory environment^6,7,8,9.

With the continuous development of artificial intelligence, the powerful data identification and nonlinear fitting capabilities of ML make it have certain advantages in solving complex engineering problems¹⁰. Among them, a lot of research has been done on the prediction of the basic performance of concrete. With the development of concrete, ML has become a new tool for its performance research, and the results are fruitful.

Adding various fibers, nanoparticles or other substances to concrete can improve its mechanical properties¹¹ to form a new type of concrete, but there are few studies on its performance prediction. Chithra et al.¹² studied the use of artificial neural networks to accurately predict the compressive strength of concrete containing nano-silica particles and copper slag. Ashrafian et al.¹³ used a heuristic regression method to predict the strength and ultrasonic pulse velocity of fiber-reinforced concrete. 3D printed concrete has attracted wide attention in recent years. Its short construction time, low labor cost and high design freedom make it suitable for various fields¹⁴. However, its anisotropy makes its compressive strength unreliable in predicting. Then, the beetle antenna search technology proposed by¹⁵ can automatically search the optimal hyperparameters, thus realizing the accurate prediction of the compressive strength of 3D printed concrete under steam curing conditions. Long et al.¹⁶ established a long- and short-term memory network model to further improve the accuracy of predicting the dynamic compressive strength of concrete-like materials at high strain rates. The above research provides certain ideas for the performance prediction of other types of concrete and has practical guiding significance. More researches on ML are shown in Table 1 below.

Table 1 A synopsis of models from earlier research.

Full size table

In short, ML as a sign of the age of intelligence, it opens up countless possibilities in civil engineering. It not only improves production efficiency, but also creates unprecedented changes in many aspects such as structural quality control and structural safety monitoring. Especially in the era of "double carbon", machine learning technology plays a huge role in the development and performance improvement of building materials. Therefore, to enable civil engineering, it will be able to help the early achievement of the "double carbon" goal, and help realize the diversified and intelligent development of civil engineering.

In this paper, the single model of BPNN and SVM is combined with GA optimization algorithm to make up for the lack of parallel computing capability of SVM, easy to fall into local optimal solution and easy to disappear gradient during BPNN training. At the same time, more input variables are introduced to further improve the prediction accuracy and reliability of the prediction model. In addition, the interpretability of the model was verified by GRA and Shapley analysis, which ensured the reliability of the model interpretation. It provides guidance for subsequent researchers to evaluate the compressive strength of HPC.

Description and analysis of database

For machine learning techniques, database compilation is a fundamental and critical step. In order to make the research results universal, the composition of HPC is studied in this paper. Experimental data samples were collected from 9 literatures, and outlier data sets with water-binder ratio less than 0.1 or greater than 0.5 were eliminated, so as to establish a 28-day HPC cube compressive strength database. Table 2 lists detailed statistics of the data used in this article.

Table 2 HPC compressive strength test parameters .

Full size table

The distribution of input variables greatly affects the degree of generalization of the constructed model³⁶. The cumulative frequency range histogram in Fig. 1 illustrates how input factors affect the compressive strength of HPC. From the graph, it can be seen that the frequency of the vector is appropriately high, and the range of input variables is wide. Cement content ranges from 220 to 708 (kg/m³) while most values are between 300 and 650 (kg/m³). Compared with cement content, the water content range is relatively narrow, and most are between 160 and 185 (kg/m³). Regarding the content of silica fume and fly ash, there are many cases where the content is zero, and the dispersed content of the two is between 0 and 200 (kg/m³) and 0–275 (kg/m³), respectively, and the change is relatively large. The water-binder ratio is another important vector influencing the compressive strength of HPC^37,38, and its value is mainly concentrated around 0.3. In this research, the compressive strength of HPC ranges from 38 to 123 MPa and is concentrate between 40 and 80 MPa.

Figure 2. shows the multiple-correlation matrix. Different colors represent different correlation values. The horizontal and vertical cross factors have a positive association when the value is positive, and a negative correlation when it is negative. Among the 8 input variables selected, the correlation between superplasticizer and silica fume content was the strongest (R = 0.84), which is consistent with previous research. It is commonly known that the strength and mechanical characteristics of HPC can be considerably impacted by the dosage of silica fume and superplasticizer^37,38. Regarding the relation between input and output variables, superplasticizer and compressive strength have the largest positive association, with silica fume and cement following closely behind. In general, there is little difference in the positive and negative relevance found between the input and output variables. Consequently, to guarantee the model's accuracy, all eight input variables were employed.

Machine learning approaches

BPNN

ANN is a calculation model based on simulating the connection and excitation suppression of neurons in the human brain. Typically, its structure is composed of three layers of neurons: the input layer, hidden layer, and output layer. Neurons in the input layer are responsible for receiving information from the surrounding environment, and neurons in hidden layer and output layer are in charge of carrying out linear and nonlinear approximations of the system under investigation. A single microprocessor will add up the weighted values received from the input layer neurons during the linear phase of ANN. The function of activation is used to the sum during the nonlinear phase, and the outcome is sent as the microprocessor's output. In an ANN, weights are used to connect each layer's neurons to one another sequentially. Figure 3 illustrates the ANN's structural layout. The total amount of neurons in the input layer and output layer, respectively, is represented by the amount of features and tags in data set. It is possible to ascertain the amount of neurons in the hidden layer using an empirical formula method or a trial-and-error approach.

Support vector machine regression (SVM)

Cortes and Vapnik³⁹ present SVM as a solution for categorization issues. In order to solve regression and prediction issues, SVM is a significant branch that SVM extended. These kinds of regression and prediction models usually compute model losses using the difference value between the model's expected and actual output values. The loss is only zero when the predicted value of the model matches the true value. As shown in Fig. 4, the low dimensional feature space of the sample can be transformed into a high-dimensional feature space using Gaussian radial basis functions to enhance the performance of the regression model and better match the collected data. The output of the SVM is expressed as a linear function, as shown below:

$$ f(x) = \left\langle {\omega \varphi (x)} \right\rangle + b $$

(1)

where 〈$\cdot $〉represents point function; $\omega $ is the minimal value obtained from the following equation.

$\varphi (x)$ is a mapping of input features to higher-dimensional feature spaces; b is one of the function's parameters vectors.

$$ \min imise\left[ {\frac{1}{2}\left\| \omega \right\|^{2} + C\sum\nolimits_{j - i}^{n} {\left( {\xi_{j} + \xi_{j}^{ * } } \right)} } \right] $$

(2)

$$ subjectto\{ y_{i} - \omega \varphi \left( {x_{i} } \right) - b \le \varepsilon + \xi_{j} \omega \varphi \left( {x_{i} } \right) + b - y_{j} \le \varepsilon + \xi_{j}^{*} \xi_{j} ,\xi_{j}^{*} \ge 0 $$

where C for the regularization parameter; n represents the quantity of samples; ${\xi }_{j}$,${\xi }_{j}^{*}$ are slack variables; $\varepsilon $ is insensitive loss function; ${y}_{i}$ is experimental value.

Principle of genetic algorithm (GA)

The natural biological evolution system's computer numerical simulation technology is where GA originated. It is a technique for competition based on random global search and optimization that was created by modeling how organisms naturally evolve⁴⁰. It is utilized for hyperparameter modification in various machine learning algorithms and is based on Mendelian genetics and Darwin's theory of evolution. In an iterative search process, it can automatically search, gather information about the search space, and adaptively regulate the search to find the best solution. Consequently, it can be considered a worldwide, efficient, parallel search heuristic method. Because the conventional BPNN has a tendency to enter a local minimum, the SVM hyperparameter adjustment is crucial. The baseline weights and biases of the BPNN are adjusted in this work using the random global search and optimization capabilities of the GA. Additionally, the global search and optimization of the SVM's hyperparameters is employed to improve the model's precision. Figure 5 illustrates how BPNN and SVM are optimized using GA.

Assessment of model function

Three widely used statistical indicators—Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Squared Correlation Coefficient (R²)—were utilized to evaluate the built prediction model in order to determine its performance^28,41,42. While both MAE and RMSE provide insight on the size of the participation error⁴³, the RMSE data are mostly used to choose the optimal prediction model. With lower MAE and RMSE values, the established prediction model performs better. R² is a statistical metric that quantifies the extent to which the independent variable can account for variations in the dependent variable. Its values fall between 0 and 1. A regression line is said to be closer to each sample test point if the R² value is endlessly close to 1. This suggests that the larger the ratio of the regression's sum of squares to the population's sum of squares. Regression fitting works best when changes in independent variable x can completely account for changes in the value of the dependent variable, y⁴². The formulas of the three indicators are as follows:

$$ MAE = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {\widehat{{y_{i} }} - y_{i} } \right|} $$

(3)

$$ RMSE = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {\widehat{{y_{i} }} - y_{i} } \right)}^{2} } $$

(4)

$$ R^{2} = 1 - \frac{{\sum\limits_{i = 1}^{n} {\left( {\widehat{{y_{i} }} - y_{i} } \right)}^{2} }}{{\sum\limits_{i = 1}^{n} {\left( {y_{i} - \overline{y} } \right)^{2} } }} $$

(5)

where $\widehat{{y}_{i}}$、${y}_{i}$ are predicted value and tested value respectively; ‾y is the mean of all tested values; the test set's overall amount of samples is denoted by n.

Research framework process

The frame structure of machine learning model for studying compressive strength of HPC in this paper is shown in Fig. 6. There are generally four steps:

Step 1: Database construction.

There are 454 samples overall in the built database, including PC, W, SP, SF, FA, AE, W/B, S/A, and HPC compressive strengths. The database was subjected to multi-correlation matrix analysis to ascertain the level of relationship among the input variables. Then, utilizing 70% of database for training and 30% for testing, the machine learning model was put to use.

Step 2: Training of the machine learning model.

The training set was applied in this step to train two independent machine learning models of the suggested BPNN and SVM. Furthermore, GA was employed to optimize the model's weights, thresholds, and hyperparameters, producing two coupled machine learning models: GA-BPNN and GA-SVM.

Step 3: Defining the optimal machine learning model.

The test dataset is applied to validate the machine learning model. Next, using statistical markers (MAE, RMSE, R²), the optimal model is chosen.

Step 4: Parameter analysis.

GRA and shapley analysis were used to test the characteristics and significance of input variables, and the influence of the number and value of input variables on the model was investigated. Finally, the actual experimental results are analyzed and compared with the predicted results generated by the six combinations.

The results of the research and discussion

Performance assessment of models

The quantity of neurons in hidden layer of BPNN influences its property, whereas the c and g parameters have an impact on SVM performance. BPNN may approximate any nonlinear model of topological structure with three layers^44,45 As a result, this paper only used one hidden layer. If there are not enough neurons in the network, it cannot accurately depict the connection between input and output. On the other hand, an excess of neurons may result in an extended running period and the over-fitting phenomenon. Generally, the empirical formula (6) is used to determine the quantity of neurons. The results of comparing and selecting the neuron count of hidden layers on network performance (MSE) are shown in the Fig. 7. The parameters c and g were optimized through cross validation and genetic algorithm, and the results are shown in Fig. 8.

Table 3 shows the course of trial and error that was utilized to determine the optimal hyperparameters for the machine learning. While certain parameters were left as default, each ideal parameter utilized in the model was fine-tuned using a combination of expertise and trial-and-error to get its best value. Next, each model's forecast precision was computed.

$$ L = \sqrt {n + m} + a $$

(6)

Table 3 Optimal hyperparameters for machine learning models.

Full size table

The variables L, n, and m represent the quantity of neutrons in hidden layer, output layer, and input layer, respectively, of a BPNN. The constant, a, can be any of the following values. Figure 7 illustrates how the quantity of neutrons in hidden layer affects the network's MSE. In this research, the ANN was run multiple times to get the train and test sets' MSE values. Their average was then computed to determine the ideal number of hidden layer nodes.

The comparison of the four developed machine learning models is displayed in Fig. 9. The training set's R² value is consistently less than the test set's, even for all employed algorithms, albeit the difference is negligible. The model that employs BPNN has the lowest accuracy since it has the lowest R² value and bigger RMSE and MAE values than the SVM model. Overall, all algorithm of this study exhibit strong performance, with accuracy in predicting surpassing that of a single model. But the GA-SVM combination fared better than the others, with the lowest MAE and RMSE values as well as the greatest R² value.

The advantages of the evolutionary algorithm in stochastic global search and optimization are evident in the hybrid model's superior performance over the single model, as demonstrated by the comparative findings in Fig. 9⁴⁰. GA, for instance, is difficult to trap in a local minimum. Thus, GA can enhance the efficacy of a single predictive model by reducing prediction error; in this instance, a combination of models was applied to estimate the HPC property. For the train and test sets of the database, Fig. 10 presents a comparison and comparison of the actual and predictive values of compressive strength.

Figure 10 displays the goodness of fit of the tested and anticipated compressive strength of HPC. The figure's perfect fitting curve is represented by y = x, and the 5 MPa model prediction accuracy error boundary is indicated by the crossing line y = x ± 5. The training and test sets' compressive strengths, as predicted by the two coupling models, are demonstrated in terms of their accuracy and goodness of fit using the fitting curve and 5 MPa error boundary. The findings demonstrated that nearly all of the data points were situated close to the curve y = x. The test set's values of R² were 0.9882 and 0.995, respectively, while the two model training sets' R² values were R² = 0.987 and R² = 0.989. This shows that the test set's goodness of fit is higher than the training set's, and the model's predicted and tested values have a better goodness of fit. The error border shows that all of the test set's data points are inside the boundary, but only a small portion of the training set's data points for the two models are outside of it. GA-SVM's RMSE and MAE values are, respectively, 0.93437 MPa and 0.6726 MPa. According to the results, GA-SVM has the highest prediction precision in the hybrid model. This is primarily because genetic algorithms have the ability to search globally and optimize them, while SVM algorithms require fewer parameters and minimize duplicate learning.

Characteristic importance analysis

Although the GA-SVM mixed model in this study can accurately predict the compressive strength of HPC, it cannot test the influence of different input factors on the compressive strength of HPC. In order to study the importance of each input vector to the compressive strength of HPC in GA-SVM coupling model. In this paper, GRA and shapely analysis methods were used to combine the features to study the influence of the input features on the model performance⁴².

Theory of GRA

By measuring the correlation between a parent sequence (reference sequence) and a sub reference (comparison sequence) using the sequence curve that the current data set forms, GRA is able to perform quantitative analysis. The research object's data set serves as the reference sequence, and the pertinent variables that have an impact on the research object serve as the GRA comparison sequence. It is possible to ascertain the degree of correlation by comparing the two sequence curves. This method uses the degree of relevance—which is a quantification of the geometric form similarity—to reflect the degree of similarity between the geometric shapes of sequence curves. The largest impact on the system's reference sequence correlation can be identified using correlation calculations^46,47,48. The primary idea is to look into the degree of connection between comparison sequences and reference sequence pairs using a simple model. Therefore, the best possible sorting of the comparison sequence is the aim of influence. It is assumed that the comparison and reference sequences are RS and CS, respectively.

$$ X_{0} = \, \left\{ {x_{0} \left( {1} \right),x_{0} \left( {2} \right) \, \ldots ,x_{0} \left( {\text{n}} \right)} \right\} \, \left( {{\text{RS}}} \right) $$

$$ X_{i} = \, \left\{ {x_{i} \left( {1} \right),x_{i} \left( {2} \right) \, \ldots ,x_{i} \left( {\text{n}} \right)} \right\} \, \left( {{\text{CS}}} \right) $$

i is the amount of CS, which is 8 in this paper.

The following are the stages involved in operating GRA:

1.
Dimensionless method: Since the range and units of initial data collection vary, dimensionless processing is required. However, due to the existence of 0 information in the data of this study, the dimensionality cannot be treated by means of averaging. In this paper, the dimensionless data is processed in a standardized way.

$$ x_{i} (z){\prime} = \frac{{x_{i} (z) - \mu }}{\sigma } $$

(7)

where z = 1, 2…, n-1, n. $\mu$ is the mean value of the data sample and $\sigma$ is the standard deviation of the data sample.

2.
To solve the gray relation coefficient:

$$ \zeta_{i} ({\text{z}}) = \frac{{\mathop {\min }\limits_{i} \mathop {\min }\limits_{z} \left| {x_{0} (z) - x_{i} (z)} \right| + \rho \cdot \mathop {\max }\limits_{i} \mathop {\max }\limits_{z} \left| {x_{0} (z) - x_{i} (z)} \right|}}{{\left| {x_{0} (z) - x_{i} (z)} \right| + \rho \cdot \mathop {\max }\limits_{i} \mathop {\max }\limits_{z} \left| {x_{0} (z) - x_{i} (z)} \right|}} $$

(8)

where $\rho \in (\text{0,1})$ represents the resolution coefficient. The resolution will increase as the resolution coefficient decreases. The resolution coefficient is set as 0.2 in this paper.

3.
To solve the degree of gray relation:

$$ \gamma_{i} = \frac{1}{n}\sum\limits_{z = 1}^{n} {\zeta_{i} (z)} $$

(9)

where z = 1, 2…, n-1, n.

4.
Sorting by gray relationship degree: The measure of sequence proximity and the definition of the connection level order between the parent and subsequences are based on the level of gray relation solved.

GRA analysis enhances model interpretability

The standard compressive strength of HPC cube at 28 days of age is taken as reference sequence, and its 8 components are used as factors to establish a comparison sequence, and then the GRA model of HPC compressive strength is established by MATLAB. The model research results are shown in Fig. 11.

A correlation analysis of the GA-SVM prediction model's input variables and output values is presented in Fig. 11. A correlation degree more than 0.7 is typically considered significant, 0.5 to 0.7 is considered rather significant, and the remaining correlation degrees are considered insignificant. The research results show that the main factor affecting the compressive strength of HPC is W/B^49,50,51. W/B too small will make the concrete easy to crack, not conducive to site construction operations. Conversely, excessive W/B will reduce the compressive strength. Therefore, appropriate W/B is crucial. Secondly, although SF, W, PC and AE have less influence on compressive strength than W/B, the importance value of the features still belongs to a fairly significant range. However, SP, S/A and FA belong to the non-significant range because their feature importance values are lower than 0.5. Although these characteristics have little influence on the strength, it is feasible to appropriately improve the compressive strength of HPC. For example, the addition of SP at an appropriate water-cement ratio can change the void structure inside the concrete and the final form of the hydration product. The mechanical properties, workability and durability of HPC are affected by S/A. When the concrete mixture falls within the tolerable range, it can achieve increased fluidity, maintain good cohesion and water retention, and thus improve strength. According to GRA theory, the closer the correlation value is to 1, the greater the impact of this feature on the compressive strength of HPC⁴⁹. It can be concluded that all the characteristics have an effect on the compressive strength of HPC, among which W/B, SF and W are the most influential. Therefore, in order to improve the calculation speed of the model and reduce the workload of researchers, the less important elements can be removed when predicting the compressive strength of HPC..

Shapley analysis enhances model interpretability

Shapley analysis is an analytical method that enhances the interpretability of machine learning models. It uses the feature importance indicator to describe the importance of the input variable of the database to affect the output value. Shapley analysis was carried out in this paper, and the results of feature importance are shown in the figure. Among the 8 characteristics, W/B has the most significant influence on compressive strength, and its value can reach 5.794. In addition, the importance value of SF feature was 4.872, which also contributed more to compressive strength, followed by AE, PC, SP, S/A and FA. Compared with the analysis results of GRA, the two analysis results are roughly the same.

It is crucial to understand the specific impact of each feature in the database on the machine learning output target. Therefore, the summary of SHAP values in Fig. 12 shows the reader the impact of each feature on the output target. Each point in the diagram represents the SHAP value of the variable. The color of the dots from pink to blue indicates that the influence of each feature is from strong to weak. The x horizontal axis represents the SHAP value of each feature, and the y vertical axis displays each feature in descending order. In addition, the data point located in the negative region has a negative correlation effect on the output, and vice versa has a positive correlation effect. Therefore, a lower W/B will increase the compressive strength of HPC, and a lower SF content will decrease the compressive strength of HPC.

Effect of the amount of input variables

In this part, the performance of the GA-SVM model is studied by combining the results of the feature importance analysis of the database with the best prediction model GA-SVM. The compressive strength predicted by the numerical model is affected by the number of features and their significance⁴². Therefore, on the basis of the feature importance analysis results, this study gradually ignored some features, and finally only retained three input variables, AE, PC and SP, and established six numerical prediction models. The goal is to ignore certain variables and study how the performance of the model changes. Table 4 shows the performance evaluation indexes of 6 kinds of combination models, and the specific test values and predicted values are shown in Fig. 13.

Table 4 Combination performance of different input variables.

Full size table

Figure 13 shows the predictions of six models created according to the importance of compressive strength for each input feature. Combined with Table 4, it can be seen that with the reduction of the number of input features, the evaluation indicators of the model show an overall decline trend, with the maximum decline rate of 11.97%. This is because model a has all the input features and model e has the fewest input features. Model f also has the least number of features, but its performance is much better than model e. This is because model f has higher feature importance AE, and model e has lower feature importance. The influence of AE feature importance on the model can be compared from models d and e. Therefore, the importance and number of input features together determine the predictive performance of the model.

In engineering practice, it is crucial to quickly find important information from large amounts of data. The main goal of GRA and shapley's analysis is to accomplish this critical step by using mathematical models to assess the importance of database features. Therefore, GRA and shapley analysis are used to reduce the dimensionality of the database by reducing the number of input features. This allows the model to run faster while maintaining accuracy and identifying important data.

Conclusion

In this paper, machine learning technology is used to study the nonlinear relationship between the HPC compressive strength and its influencing factors under large data samples, and the prediction model of its compressive strength is developed. The accuracy and performance of the model are evaluated based on a variety of statistical indexes. Finally, GRA analysis and Shapley analysis are used to verify and understand the relative importance and influence of each input feature on the output target.

1.
The four machine learning models constructed in this study all have good prediction accuracy for the compressive strength of HPC. Moreover, the prediction accuracy and performance of GA-SVM model are higher than other models in both training and testing stages. The R² value of the training phase is 0.9882, and the R² value of the test phase is 0.995. Obviously, the accuracy and performance of the model testing phase are better than that of the training phase. This shows that the developed GA-SVM model has a strong data fitting ability and can fully capture the nonlinear relationship between features and output targets.
2.
GRA and Shapley analysis together show that the importance of the characteristics of the HPC compressive strength database built in this study for the characteristics of compressive strength is sorted as follows: W/B, SF, W, AE, PC, SP, S/A, FA. This is of great significance for understanding the structure and characteristics of concrete.
3.
The performance of the six models constructed by the combination of feature importance ranking shows that the performance of the model shows a decreasing trend with the decrease of the number of input features. Model (e) has the largest decline, with a decrease of up to 11.97% compared to model A. For one thing, model (e) has the fewest input features. On the other hand, AE has a high degree of contribution to model e. This confirms that the feature importance ranking analyzed in this paper is correct, and can provide help for the subsequent research on the optimization of the mix ratio of high performance concrete.
4.
In the follow-up work, GA-SVM machine learning model can be used to study the durability of HPC. Then combined with NSGA-II multi-objective optimization algorithm, the mix ratio optimization design of high performance concrete is carried out. On the basis of the improvement of durability and compressive strength, the manufacturing cost of HPC concrete is reduced, which makes a certain contribution to the goal of "double carbon".

Data availability

Since the data in this article will be used in subsequent studies, the data are not publicly available. However, it may be obtained from the corresponding author (jinlb@haut.edu.cn) at his reasonable request.

References

He, H. et al. Research progress in mechanisms, influence factors and improvement routes of chloride binding for cement composites. J. Build. Eng. 86, 108978 (2024).
Article Google Scholar
Chang, T. P., Chuang, F. C. & Lin, H. C. A mix proportioning methodology for high-performance concrete. J. Chin. Inst. Eng. 19(6), 645–655 (1996).
Article CAS Google Scholar
Bhanja, S. & Sengupta, B. Investigations on the compressive strength of silica fume concrete using statistical methods. Cem. Concr. Res. 32(9), 1391–1394 (2002).
Article CAS Google Scholar
Bharatkumar, B. H. et al. Mix proportioning of high-performance concrete. Cement Concr. Compos. 23(1), 71–80 (2001).
Article CAS Google Scholar
Zain, F. M. & M, M Abd S.,. Multiple regression model for compressive strength prediction of high-performance concrete. J. Appl. Sci. 9(1), 155–160 (2009).
Article Google Scholar
Bischoff, P. H. & Perry, S. H. Compressive behaviour of concrete at high strain rates. Mater. Struct. 24(6), 425–450 (1991).
Article CAS Google Scholar
Chen, H. et al. An RF and LSSVM–NSGA-II method for the multi-objective optimization of high-performance concrete durability. Cement Concr. Compos. 129, 104446 (2022).
Article CAS Google Scholar
Lessard, M., Challal, O. & Aticin, P. C. Testing high-strength concrete compressive strength. Mater. J. 90(4), 303–307 (1993).
CAS Google Scholar
Shi, H., Xu, B. & Zhou, X. Influence of mineral admixtures on compressive strength, gas permeability and carbonation of high-performance concrete. Constr. Build. Mater. 23(5), 1980–1985 (2009).
Article Google Scholar
Salehi, H. & Burgueño, R. Emerging artificial intelligence methods in structural engineering. Eng. Struct. 171, 170–189 (2018).
Article Google Scholar
Huang, H. et al. Property assessment of high-performance concrete containing three types of fibers. Int. J. Concr. Struct. Mater. 15, 1–17 (2021).
Article Google Scholar
Chithra, S. et al. A comparative study on the compressive strength prediction models for high performance concrete containing nano silica and copper slag using regression analysis and artificial neural networks. Constr. Build. Mater. 114, 528–535 (2016).
Article CAS Google Scholar
Ashrafian, A. et al. Prediction of compressive strength and ultrasonic pulse velocity of fiber reinforced concrete incorporating nano silica using heuristic regression methods. Constr. Build. Mater. 190, 479–494 (2018).
Article CAS Google Scholar
Singh, A. et al. Utilization of antimony tailings in fiber-reinforced 3D printed concrete: A sustainable approach for construction materials. Constr. Build. Mater. 408, 133689 (2023).
Article CAS Google Scholar
Yao, X. et al. AI-based performance prediction for 3D-printed concrete considering anisotropy and steam curing condition. Constr. Build. Mater. 375, 130898 (2023).
Article Google Scholar
Long, X. et al. Machine learning method to predict dynamic compressive response of concrete-like material at high strain rates. Def. Technol. 23, 100–111 (2023).
Article CAS Google Scholar
Aiyer, B. G. et al. Prediction of compressive strength of self-compacting concrete using least square support vector machine and relevance vector machine. KSCE J. Civ. Eng. 18(6), 1753–1758 (2014).
Article Google Scholar
Motamedi, S. et al. RETRACTED: Estimating unconfined compressive strength of cockle shell–cement–sand mixtures using soft computing methodologies. Eng. Struct. 98, 49 (2015).
Article Google Scholar
Pham, A. D., Hoang, N. D. & Nguyen, Q. T. Predicting compressive strength of high-performance concrete using metaheuristic-optimized least squares support vector regression. J. Comput. Civ. Eng. 30(3), 06015002 (2016).
Article Google Scholar
Omran, B. A., Chen, Q. & Jin, R. Comparison of data mining techniques for predicting compressive strength of environmentally friendly concrete. J. Comput. Civ. Eng. 30(6), 04016029 (2016).
Article Google Scholar
Zhang, J. et al. Modelling uniaxial compressive strength of lightweight self-compacting concrete using random forest regression. Constr. Build. Mater. 210, 713–719 (2019).
Article Google Scholar
Yuan, Z., Wang, L. N. & Ji, X. Prediction of concrete compressive strength: Research on hybrid models genetic based algorithms and ANFIS. Adv. Eng. Softw. 67, 156–163 (2014).
Article Google Scholar
Chandwani, V., Agrawal, V. & Nagar, R. Modeling slump of ready-mix concrete using genetic algorithms assisted training of artificial neural networks. Expert Syst. Appl. 42(2), 885–893 (2015).
Article Google Scholar
Yan, F. et al. Evaluation and prediction of bond strength of GFRP-bar reinforced concrete using artificial neural network optimized with genetic algorithm. Compos. Struct. 161, 441–452 (2017).
Article Google Scholar
Fan, D. et al. Precise design and characteristics prediction of ultra-high-performance concrete (UHPC) based on artificial intelligence techniques. Cement Concr. Compos. 122, 104171 (2021).
Article CAS Google Scholar
Wang, X., Liu, Y. & Xin, H. Bond strength prediction of concrete-encased steel structures using hybrid machine learning method[C]//Structures. Elsevier 32, 2279–2292 (2021).
MathSciNet Google Scholar
Asteris, P. G. et al. Prediction of self-compacting concrete strength using artificial neural networks. Eur. J. Environ. Civ. Eng. 20(sup1), s102–s122 (2016).
Article Google Scholar
Baykasoğlu, A., Öztaş, A. & Özbay, E. Prediction and multi-objective optimization of high-strength concrete parameters via soft computing approaches. Expert Syst. Appl. 36(3), 6145–6155 (2009).
Article Google Scholar
Chindaprasirt, P. et al. Influence of fly ash fineness on the chloride penetration of concrete. Constr. Build. Mater. 21(2), 356–361 (2007).
Article Google Scholar
Elahi, A. et al. Mechanical and durability properties of high-performance concretes containing supplementary cementitious materials. Constr. Build. Mater. 24(3), 292–299 (2010).
Article Google Scholar
Chujie, J., Wenhua, Z. & Juan, H. Slump and strength model of high-strength concrete. J. Southeast Univ.: Nat. Sci. Edition 40(S2), 144–149 (2010).
Google Scholar
Lim, C. H., Yoon, Y. S. & Kim, J. H. Genetic algorithm in mix proportioning of high-performance concrete. Cement Concr. Res. 34(3), 409–420 (2004).
Article CAS Google Scholar
Pala, M. et al. Appraisal of long-term effects of fly ash and silica fume on compressive strength of concrete by neural networks. Constr. Build. Mater. 21(2), 384–394 (2007).
Article Google Scholar
Prasad, B. K. R., Eskandari, H. & Reddy, B. V. V. Prediction of compressive strength of SCC and HPC with high volume fly ash using ANN. Constr. Build. Mater. 23(1), 117–128 (2009).
Article Google Scholar
Yen, T. et al. Influence of class F fly ash on the abrasion–erosion resistance of high-strength concrete. Constr. Build. Mater. 21(2), 458–463 (2007).
Article Google Scholar
Iqbal, M. F. et al. Sustainable utilization of foundry waste: Forecasting mechanical properties of foundry sand-based concrete using multi-expression programming. Sci. Total Environ. 780, 146524 (2021).
Article CAS PubMed Google Scholar
Shi, C. et al. A review on ultra-high-performance concrete: Part I. Raw materials and mixture design. Constr. Build. Mater. 101, 741–751 (2015).
Article Google Scholar
Wang, D. et al. A review on ultra-high-performance concrete: Part II. Hydration, microstructure and properties. Constr. Build. Mater. 96, 368–377 (2015).
Article Google Scholar
Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20(3), 273–297 (1995).
Article Google Scholar
Suratgar, A. A., Tavakoli, M. B. & Hoseinabadi, A. Modified Levenberg-Marquardt method for neural networks training. World Acad. Sci. Eng. Technol. 6(1), 46–48 (2005).
Google Scholar
Alshihri, M. M., Azmy, A. M. & El-Bisy, M. S. Neural networks for predicting compression strength of structural light weight concrete. Constr. Build. Mater. 23(6), 2214–2219 (2009).
Article Google Scholar
Feng, D. C. et al. Machine learning-based compressive strength prediction for concrete: An adaptive boosting approach. Constr. Build. Mater. 230, 117000 (2020).
Article Google Scholar
Chai, T. & Draxler, R. R. Root mean square error (RMSE) or mean absolute error (MAE)–Arguments against avoiding RMSE in the literature. Geosci. Model Dev. 7(3), 1247–1250 (2014).
Article ADS Google Scholar
Bengio, Y. & Le Cun, Y. Scaling learning algorithms towards AI. Large-Scale Kernel Mach. 34(5), 1–41 (2007).
Google Scholar
Hornik, K., Stinchcombe, M. & White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989).
Article Google Scholar
Prusty, J. K. & Pradhan, B. Multi-response optimization using Taguchi-Grey relational analysis for composition of fly ash-ground granulated blast furnace slag based geopolymer concrete. Constr. Build. Mater. 241, 118049 (2020).
Article CAS Google Scholar
Zhang, P., Liu, C. & Li, Q. Application of gray relational analysis for chloride permeability and freeze-thaw resistance of high-performance concrete containing nanoparticles. J. Mater. Civ. Eng. 23(12), 1760–1763 (2011).
Article CAS Google Scholar
Zhu, L., Zhao, C. & Dai, J. Prediction of compressive strength of recycled aggregate concrete based on gray correlation analysis. Constr. Build. Mater. 273, 121750 (2021).
Article Google Scholar
Abuodeh, O. R., Abdalla, J. A. & Hawileh, R. A. Assessment of compressive strength of ultra-high-performance concrete using deep machine learning techniques. Appl. Soft Comput. 95, 106552 (2020).
Article Google Scholar
Kaloop, M. R. et al. Compressive strength prediction of high-performance concrete using gradient tree boosting machine. Constr. Build. Mater. 264, 120198 (2020).
Article Google Scholar
Shi, Y. et al. Design and preparation of ultra-high-performance concrete with low environmental impact. J. Cleaner Prod. 214, 633–643 (2019).
Article CAS Google Scholar

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (52178171), the Special Focus on the Development and Promotion of Henan Province (232103810080, 232102320014, 232102110266), the fund of Henan Key Laboratory of Grain and Oil Storage Facility & Safety (2023KF08), and the Innovative Funds Plan of Henan University of Technology (2022ZKCJ06). The authors express gratitude for their support.

Author information

Authors and Affiliations

School of Civil Engineering, Henan University of Technology, Zhengzhou, 450000, China
Libing Jin, Jie Duan, Pengfei Xue & Pin Zhou
School of Physical Science and Engineering, Beijing Jiaotong University, Beijing, 100044, China
Yichen Jin
China MCC5 Group Corp. Ltd, Chengdu, 610000, China
Jie Duan

Authors

Libing Jin
View author publications
You can also search for this author in PubMed Google Scholar
Jie Duan
View author publications
You can also search for this author in PubMed Google Scholar
Yichen Jin
View author publications
You can also search for this author in PubMed Google Scholar
Pengfei Xue
View author publications
You can also search for this author in PubMed Google Scholar
Pin Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.J.: Conceptualization, Supervision, Funding acquisition. J.D.: Methodology, Software, Writing—original draft, Validation, Investigation. Y.J.: writing—review and editing. P.X.: Funding acquisition, Resources. P.Z.: data and image processing. The author's name is correctly listed as "P.Z.". All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Libing Jin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Jin, L., Duan, J., Jin, Y. et al. Prediction of HPC compressive strength based on machine learning. Sci Rep 14, 16776 (2024). https://doi.org/10.1038/s41598-024-67850-9

Download citation

Received: 07 March 2024
Accepted: 16 July 2024
Published: 22 July 2024
DOI: https://doi.org/10.1038/s41598-024-67850-9
Springer Nature Limited

Prediction of HPC compressive strength based on machine learning

Abstract

Similar content being viewed by others

Prediction of compressive strength of high-performance concrete (HPC) using machine learning algorithms

Employing the optimization algorithms with machine learning framework to estimate the compressive strength of ultra-high-performance concrete (UHPC)

A hybrid machine learning model to estimate self-compacting concrete compressive strength

Introduction

Description and analysis of database

Machine learning approaches