Introduction

In the construction industry, the sustainability of construction materials is now a critical issue. There has been a remarkable momentum in the last decade to use industrial by-products in the construction sector to achieve sustainability. Time and cost are the most critical considerations to be included in the preparation of each project during the construction project (Aprianti et al. 2015; Zhong and Wu 2015; Kylili and Fokaides 2017; Mirzahosseini et al. 2019; Latif et al. 2020c). In traditional and industrial constructions, conventional concrete and high-performance concrete (HPC) have been commonly used, respectively. The HPC is a composite material that has been used to withstand environmental conditions in the manufacture of high-strength concrete used in bridges, tall buildings, tunnels, and pavement structures. In addition, in structural applications, the durability and workability of the HPC are essential considerations. A complex method has been used to develop the technological characteristics of concrete to achieve the necessary characteristics of the HPC (Kasperkiewicz et al. 1995; Chou et al. 2011; Zhong and Wille 2015; Gonzalez-Corominas and Etxeberria 2016; Hoang et al. 2016; Wang et al. 2019; Kaloop et al. 2020).

Nowadays, artificial intelligence has been widely used for prediction and other purposes in engineering fields (Borhana et al. 2020; Ehteram et al. 2020a; Latif et al. 2021b, c; Ehteram et al. 2020b, c; Lai et al. 2020; Latif et al. 2020a, b, 2021a; Najah et al. 2021; Parsaie et al. 2021; Jumin et al. 2021; Latif and Ahmed 2021). To measure the HPC and CCS and test the input variables, many of the previous studies used soft computing techniques (Gupta et al. 2006; Chopra et al. 2018; Dutta et al. 2018; Yaseen et al. 2018; Al-Shamiri et al. 2019, 2020; Vakharia and Gujar 2019; Young et al. 2019; Feng et al. 2020; Latif 2021). For instance, Abuodeh et al. (2020) employed two deep machine learning techniques, namely sequential feature selection (SFS) and neural interpretation diagram (NID), to classify the essential material constituents that affect the artificial neural network (ANN). Based on the material quantities, they used 110 ultra-high performance concrete (UHPC) compressive strength tests to train the ANN. As a result, four material components were chosen, primarily cement, fly ash, silica fume, and water, and then used in the ANN to determine more precise predictions than the model with all eight material components. Finally, based on the four chosen material constituents, they have developed a nonlinear regression model. Their findings show that the use of ANN with SFS and NID greatly enhanced the model’s accuracy and offered useful insights into the predictions of ANN compressive intensity for various UHPC mixes.

Ling et al. (2019) optimized support vector machine (SVM) model by K-Fold cross validation for predicting and evaluating the degradation of CCS in a complicated marine environment. They also built ANN and decision tree (DT) to compare the prediction precision with the SVM model. Their results showed that the SVM model had the best prediction performance.

Furthermore, Shaqadan (2020) developed SVM and neural network models to predict the CCS using five input variables, including silica additive fraction. He used a 90 samples data set and measured compressive strength after 3 and 28 days for different levels of milling time. According to his result, both SVM and ANN showed a good correlation coefficient of 0.929 and 0.986, respectively.

Another study were proposed by Naderpour et al. (2018) to predict recycled aggregate concrete compressive strength using ANN. In their analysis, they used 139 existing sets of data derived from 14 published literature sources to create training and testing data for ANN model creation. Their findings suggest that the ANN is a useful method for predicting the compressive strength of recycled aggregate concrete, which is made up of various forms and sources of recycled aggregates.

Moreover, Deng et al. (2018) proposed a study using a deep learning model to predict the compressive strength of recycled concrete. In their study, softmax regression was used to construct their proposed model. Their findings revealed that the prediction model based on deep learning outperforms the conventional neural network model in terms of precision, performance, and generalization ability, and could be considered a new approach for calculating the strength of recycled concrete.

Furthermore, Latif (2021) conducted a research on developing deep learning model for predicting CCS. He developed long short-term memory (LSTM) model and applied SVM as a conventional machine learning model in order to compare the accuracy of both models. According to his finding, LSTM outperformed SVM with R2=0.98, R2= 0.78, MAE=1.861, MAE=6.152, and RMSE=2.36, RMSE=7.93, respectively.

In addition, Van Dao et al. (2019) applied an adaptive neuro-fuzzy inference system (ANFIS) and ANN for compressive strength prediction of geopolymer concrete (GPC). They have prepared concrete mixtures as input parameters. Their findings revealed that ANN and ANFIS models were successfull, but ANFIS was better than ANN.

In this paper, BDTR and SVM have been developed for predicting CCS.

Materials and methods

Data set

Nine multivariate variables already reported have been used as input parameters in order to predict the concrete compressive strength. The basic compressive intensity of the output variable is (megapascal-MPa). The number of instances is 1030, and no missing data is available. The selection of dataset components is represented in Table 1.

Table 1 Statistical components of the utilized datasets

BDTR model

Boosted tree regression is a hybrid model combining statistical and soft computing techniques. Unlike traditional approaches, whether regression or non-regression, BRT combines regression prediction at various trees for regression to build the best regression tree. Furthermore, when input parameters do not need to be removed from the output, boosted tree regression can help to highlight the nonlinear relationship between the input and output parameters. In boosted tree regression, two techniques are used: regression tree and boosting. The usage of decision tree consequences is one of the key advantages of the regression tree approach. In terms of predictor parameters, the regression trees’ technique is unforgiving on outliers and harsh on missing data. To improve model accuracy, numerous decision trees are incorporated into the boosting method (Jumin et al. 2020). The BDTR algorithm is as follow:

$$ \hat{y}(x)={\sum}_{\mathrm{t}}{w}_{\mathrm{t}}{h}_{\mathrm{t}}(x) $$
(1)
$$ O(x)={\sum}_{\mathrm{i}}l\left({\hat{y}}_{\mathrm{i}},{y}_{\mathrm{i}}+{\sum}_{\mathrm{t}}\Omega \left({f}_{\mathrm{t}}\right)\right) $$
(2)

where h(x) is the tree’s output, w is the weight, \( l\left({\hat{y}}_{\mathrm{i}},{y}_{\mathrm{i}}\right) \) is the loss function, distance between the truth and the prediction in ith sample, and Ω(ft) is the regularization function. Fig. 1 shows the structure of the BDTR model.

Fig. 1
figure 1

The structure of a typical BDTR (Lai et al. 2019)

Support vector machine

SVM is a progressive type of machine learning that highlights statistical learning rules under minor trials in statistical learning theory. Using the standard of structural risk minimization, SVM solves many functional troubles to increase simplified proficiencies, for instance, a limited sample, non-linear, high dimensional number, and global minimum points (Ben-Hur and Weston 2010). Fig. 2 shows the general structure of SVM.

Fig. 2
figure 2

The architecture of SVM (Latif 2021)

Many typical neural network models can be shown to decrease the error in the training by using the empirical risk minimization principle. SVM, on the other hand, uses the structural risk minimization principle to decrease the upper limit of the simplification error by finding the right balance between the error of the training part and the system’s capability (Latif 2021).

In this study, a sigmoid kernel was used for SVM method.

$$ Sigmoid\ Kernel:K\ \left({x}_{\mathrm{i}},{x}_{\mathrm{j}}\right)=\tanh \left({x}_{\mathrm{i}}^{\mathrm{T}}\times {x}_{\mathrm{j}}+r\right) $$
(3)

where K(xi, xj) is defined as the kernel function. γ, r , and d are kernel parameters.

There are two forms of SVM regression. Form 1 or Epsilon is regarded as the first phase of SVM regression which was used in this study.

Performance indices

Generally, it is important to evaluate the achievement of the model when evaluating the fulfillment of predictive models, using a wide range of measurement indices to decide the best model. This study suggests unique statistical indices.

R2

R2 is the primary regression analysis performance.

$$ {R}^2=\left\{\left(1/N\right)\ast \sum \right[\left( xi-X\right)\ast \left( yi-Y\right)/\left(\sigma x\ast \sigma y\right]\Big\}{}^2 $$
(4)

RMSE

RMSE is a typical variance of residuals (predictive errors).

$$ RMSE=\sqrt{\frac{1}{n}\ {\sum}_{i=1}^n{\left( Co- Cp\right)}^2} $$
(5)

where Co and Cp are observed and predicted CCS values.

MAE

MAE is a metric for comparing errors between paired observations that express the same concept. A disparity between the true value and the expected value from the example is an increasing prediction error.

$$ MAE=\frac{\sum_{i=1}^n ABS\left( yi-\lambda (xi)\right)}{n} $$
(6)

RSR

RMSE-standard deviation ratio (RSR) is used in this research to compare the best model that can be implemented to predict sediment from rivers. RSR is a valuable index for testing the computational models:

$$ RSR=\frac{\sqrt{\sum_{i=1}^N{\left({Y}_i^{obs}-{Y}_i^{sim}\right)}^2}}{\sqrt{\sum_{i=1}^N{\left({Y}_i^{obs}-{Y}_i^{mean}\right)}^2}} $$
(7)

where \( {Y}_i^{mean} \) is the average of the actual data throughout the monitoring process (Ehteram et al. 2020b). In Table 2, the presentation of RSR index ranges regarding performance rate and class.

Table 2 RSR range and the corresponding performance rate

Results and discussion

Using a machine learning algorithm, 80% of the randomly selected independent variable data can go through intense training. The remaining untrained data would then be used to assess the output of the model by using a qualified dataset. For the second data partition, the same procedure is applied. In this research, two machine learning algorithms were developed and compared, namely, BDTR and SVM, in order to check their accuracy in predicting compressive strength of concrete.

Two different approaches apply to these training datasets. The first approach is a traditional method by which the model is configured by changing the learning rate or a number of algorithm trees manually. The second solution is to add the hyperparameter module of the tuning model to the model.

For the BDTR model, a mixture of two techniques, which are decision tree algorithms and boosting techniques, are Boosted Regression Tree (BRT) models. BRTs match several decision trees repeatedly to improve the model’s accuracy since the BDTR is an algorithm used by the multiple additive regression trees (MART) gradient boosting algorithm to train the model. In a stage-wise manner, boosting constructs a set of trees, and each tree depends on prior trees. Therefore, the error in the previous tree is calculated and corrected in the next tree using a predefined loss function. This implies that the prediction is a composite of several weaker prediction models that has resulted in a reliable prediction model.

For the SVM model, sigmoid kernel was used with two kernel parameters, namely, gamma=0.10 and coefficient=0.00. The cross validation was applied in order to enhance the accuracy of the proposed model. The SVM regression type 1 was used with the training and testing partition of 80 and 20%, respectively.

The purpose of this study is to test and compare the ability to predict the CCS of the BDTR and SVM model. Therefore, to ease the performance review, the real and expected outputs were tabulated and plotted. Based on statistical indices such as R2, RMSE, MAE, and RSR, the extracted outputs were evaluated as the performance parameters for evaluating the proposed models.

BDTR model successfully applied on CCS, where R2 = 0.86, RMSE = 6.19, MAE=4.91, and RSR=0.37 for overall dataset. Fig. 3 shows the attained prediction value for the desired output CCS by BDTR versus actual data and scatter plot for the overall dataset.

Fig. 3
figure 3

The presentation of BDTR model a actual versus prediction; b scatterplot for the overall data

For the testing part, the model performed the lowest accuracy, where R2=0.81, RMSE=6.71, MAE=5.38, and RSR=0.44. Fig. 4 shows the attained prediction value for the desired output CCS by BDTR versus actual data and scatter plot for the tested dataset.

Fig. 4
figure 4

The presentation of BDTR model a actual versus prediction; b scatterplot for the tested data

Table 3 shows the performance of BDTR model for the overall and tested dataset.

Table 3 Performance of BDTR model for overall and testing dataset

Regarding the SVM model, the results showed the lower accuracy than BDTR model. For the overall SVM dataset result, R2=0.48, RMSE=12.70, MAE=9.53, and RSR=0.76. For the tested dataset, R2=0.45, RMSE=13.24, MAE=9.52, and RSR=0.81. Table 4 shows the performance of the proposed SVM model for the overall and tested dataset.

Table 4 Performance of SVM model for overall and testing dataset

Figs. 5 and 6 show the attained prediction value for the desired output CCS by SVM versus actual data and scatter plot for the overall and tested dataset, respectively.

Fig. 5
figure 5

The presentation of SVM model a actual versus prediction; b scatterplot for the overall data

Fig. 6
figure 6

The presentation of SVM model a actual versus prediction; b scatterplot for the tested data

According to the results, BDTR outperformed SVM model. Table 5 shows the comparison of both BDTR and SVM model for the testing and overall dataset.

Table 5 Comparison of BDTR and SVM results for overall and testing dataset

The RSR value for the BDTR model is in the range of very good, while it is in the range of unsatisfactory for the SVM model. Fig. 7 shows the RSR of BDTR and SVM for the overall dataset.

Fig. 7
figure 7

RSR value for the computed models

According to the results, BDTR outperformed SVM with a significant different. The results of the current study were similar to the previous studies in literature. Ling et al. (2019) optimized SVM and built ANN and DT models to compare the prediction precision with the SVM model. According to their results, SVM outperformed ANN and DT models. On the other hand, Shaqadan (2020) developed SVM and ANN models to predict the CCS. According to his result, both SVM and ANN showed a good correlation coefficient of 0.929 and 0.986, respectively. Furthermore, Latif (2021) developed LSTM and SVM models for predicting CCS. According to his finding, LSTM outperformed SVM with R2=0.98 and R2= 0.78, MAE=1.861 and MAE=6.152, and RMSE=2.36 and RMSE=7.93, respectively. Therefore, it can be concluded that the results of the current study was similar to the other studies in the field. BDTR can be a reliable model for predicting CCS.

Conclusion

In order to precisely build a suitable model, this study focuses on predicting CCS using two machine learning models namely, BDTR and SVM. As the input of the models, eight distinct concrete components were used, and the observed CCS was used as the model output. In predicting concrete compressive strength, BDTR outperformed SVM. It can be assumed that BDTR is an effective method to predict CCS, but it can depend on the input appropriateness of the datasets. In determining the weakness of concrete compressive power, the outcome of this study may be very significant. Future experiments can be carried out using a different dataset to check the consistency of the proposed model.