Introduction

Concrete is considered a highly used building construction material worldwide due to its several advantages over other materials (Berodier et al., 2019; Larsen et al., 2019; Shamsutdinova et al., 2019; Yoon & Kim, 2019). In recent times, researchers have put enormous effort into improving concrete sustainability, fresh properties (including rheology, stability, and setting) and hardened properties (including strength and durability) by substituting the cement with different supplementary cementitious material (SCMs) (Kaplan & Salem Elmekahal, 2021; Sivamani & Renganathan, 2021). Among the various properties of concrete, compressive strength is one of the most widely used mechanical properties of concrete, and it is directly related to the safety of the structures. Strong concrete based on specific compressive strength is still required (Al-Shamiri et al., 2019; Liu & Li, 2019; Yu et al., 2019; Yuan et al., 2019) since insufficient concrete compressive strength can lead to catastrophic civil infrastructure failures. However, as it is known to all, concrete is made up of various materials such as cement, blast furnace slag fly ash, water, superplasticizer, and coarse and fine aggregate, and these materials are randomly distributed throughout the entire concrete system. To accurately predict the compressive strength of this entire concrete system makes it a significant challenge.

Generally, concrete compressive strength (CCS) can be obtained through physical experiments by preparing the concrete cube or cylinder according to the mix design and then cured for the required time. However, this method is undesirable and destructive, time-consuming, requires many mixed trails, and has low working efficiency (Bischoff & Perry, 1991; Shi et al., 2009). Many researchers used the empirical regression method (Bhanja & Sengupta, 2002; Bharatkumar et al., 2001; Zain & Abd, 2009) and numerical simulation (Feng & Li, 2016; Feng et al., 2018, 2019) method to predict the compressive strength of concrete and capture the concrete behaviour but unfortunately, results show a non-linear relation between the compressive strength and concrete mixing parameters; thus it is difficult to predict the accurate compressive strength.

On the other hand, with the advancement and promising results of artificial intelligence (AI) in recent years, numerous researchers used ML algorithms/approaches such as bagged artificial neural networks (BANNs), gradient-boosted artificial neural networks (GBANNs) (Erdal et al., 2013), support vector machines (SVMs) (Latif, 2021), chi-squared automatic interaction detection (CAID), regression trees, linear regression and ARIMA (Bansal et al., 2021, 2022a, 2022b; Chou & Pham, 2013; Kaveh et al., 2021), ensemble models, genetic weighted pyramid operation tree (GWPOT) (Cheng et al., 2014; Kaveh et al., 2008), ensemble decision trees (Erdal, 2013), metaheuristic-optimized least squares support vector regression (Pham et al., 2016), fracture mechanics approach (Shafiei Dastgerdi et al., 2019), compressible packaging model (CPM) (Amario et al., 2017), artificial neural networks (ANNs) (Kaveh & Iranmanesh, 1998; Kaveh & Khalegi, 1998; Kaveh et al., 2023; Kostić & Vasović, 2015; Mohammed et al., 2021; Naderpour & Mirrashid, 2018; Naderpour et al., 2018; Słoński, 2010; Young et al., 2019), hybrid model (Shishegaran et al., 2021), quadratic polynomial model (Imanzadeh et al., 2018), and mixture optimization model (Miller et al., 2016; Zahiri & Eskandari-Naddaf, 2019; Zhang et al., 2016) to predict the CCS and in other applications. The literature review of some of the models are shown below.

Shafiei Dastgerdi et al. (2019) investigated the impact of different concrete parameters such as w/c ratio, aggregate shape, paste, air void, and fly ash content on the crack resistance of concrete over the concrete railroads by utilizing a two-parameter model (TPM). Test was carried out on twelve three-point prisms at different concrete compressive strengths. Their study shows that the decreasing w/c ratio and increasing aggregate size and volume improved the fracture toughness ratio by 30%, whereas other concrete modules negligibly influenced fracture toughness. Amario et al. (2017) analysed the probability of adopting the compressible packing model (CPM) for the concrete mixture proportion generated with recycled concrete aggregates (RCAs). The aggregate replacement variation from 0 to 100% was taken into account and various structural RCA mixtures were designed for three strength classes. At last, the implemented process was verified experimentally by carrying out the durability and mechanical tests on chosen mixtures with RCAs content nearer to 60% for three strength classes. Their study shows that CPM has a high correlation with RCAs and that overall durability performance is not influenced by RCA's presence. Young et al. (2019) presented the initial analysis of a large dataset consisting of calculated compressive strengths from original (job-site) mixtures and their respective original mixture proportions. The correlation between the mixture design variables and strength was investigated by applying a predictive model such as ANN. Their methods were well adopted over the laboratory-based dataset based on strength measurements, and the method’s performance among the two data sets was differentiated. Their results show that ANN reduces the labor and time intensity, better robustness, quality control, and is cost-efficient and thus proving the superiority of the proposed architecture. However, this method needs many data, specifically for large architecture, and is harder to visualize. Kostić and Vasović (2015) implemented a prediction approach based on ANN for CCS. Three-layer feed-forward neural networks were examined under two, six and nine hidden nodes using four diverse learning methods in their work. The more precise prediction methods having the largest coefficient of determination (R2) have been attained with six hidden nodes through Levenberg–Marquardt, with nine hidden nodes by means of Broyden–Fletcher Goldfarb–Shannon model and with scaled conjugate gradient and one-step secant models. The analysis has thus shown the improved efficiency of the proposed ANN model over conventional models. To achieve an expected compressive strength, Imanzadeh et al. (2018) introduced the usage of mixture structure as a tool for optimizing the concrete formulation on the raw earth. The experiment was made in terms of comparative analysis over the conventional models. The outcomes have demonstrated that the mixture design technique has considered being an effective tool for developing and optimizing the raw earth concrete formulation. Miller et al. (2016) developed an approach to predict the global warming potential (GWP) and compressive strength based on the water-to-binder(w/b) ratio for concrete mixtures. Their results show a linear correlation among GWP and cement content. However, developing more robust prediction tools is needed in the future and needs to examine multiple design criteria. Zhang et al. (2016) used RCA by replacing the coarse natural aggregate (NA) at different replacement levels. Their outcomes show that asphalt concrete mix design provides lower apparent relative density, higher absorption of water and lower crushing and wearing value with an increase of RCA. The main limitation is it needs further study to be conducted on the RCA from various sources. Zahiri and Eskandari-Naddaf (2019) designed twelve mix structures involving diverse percentages of nano-silica (NS), micro-silica (MS) and polymer fibers in three cement strength classes (CSC) based on the mixture optimization model. The experimental outcome has shown that every CSC's sensitivity has diverse on MS or NS in concrete compressive strength. Subsequently, in the concrete mix design, the strength classes have a considerable impact on the quantity of NS and MS, whereas in polymer fibers, no considerable impact was made on the compressive strength while accounting for the CSCs. The mixture Optimization method achieved better strength class with the increase of CCS. Many researchers also monitor and predict early-age hydration and compressive strength using a smart sensor such as piezo senor and machine learning techniques (Saravanan et al., 2015a, b;  Bharathi Priya et al., 2018; Bansal & Talakokula, 2020).

Based on the above-mentioned literature, it is concluded that only a single ML prediction model has been used to predict compressive strength. Also, no comparative approach is available that can help the community to choose the best prediction method. This paper presents a comparative study to predict the compressive strength of concrete to overcome this issue by utilizing various ML models. The contribution of this work is summarized below:

  • A comparative study and analysis of various ML models (Ordinary Least Square, Ridge Regression, Lasso Regression, ElasticNet, K Nearest Neighbours, CART, Random Forest, AdaBoost, Gradient Tree Boosting and Xtreme Gradient Boost) for the precise prediction of compressive strength of concrete.

  • Performance of Hyperparameter optimization algorithm to improve the accuracy of the model.

Dataset

The performance of the machine learning models is generally dependent on the scale of samples in the dataset. To build and compare high-accuracy models, a large number of samples are necessary which can be obtained from previous literature (Erdal et al., 2013). The dataset consists of 1030 samples and nine features.

Features

The features of concrete considered are listed below in Table 1.

Table 1 Features of the dataset

Statistical information

Since concrete compressive strength is being predicted it will be the target variable and the rest of the attributes are input variables. Table 2 gives the statistical description of the dataset such as count, minimum/maximum values, mean value, quartile (25%, 50% and 75% values) and standard deviation. From this table, it is observed that the average compressive strength is found to be 35.82 MPa with a standard deviation of 16.70 MPa in the count of 1030 samples. The cement values ranging between 102 and 540 kg/m3, it is due to the replacement of cement with fly ash and blast furnace slag with the minimum and maximum value ranging between 0–359.4 and 0–200.1 kg/m3, respectively. Age is also one of the most important parameters for the development of compressive strength, here the age values range between 1 and 365 days with a standard deviation of 63.17 days. Feature distribution of the involved parameter is shown in Fig. 1, which can help us for direct observation of the parameters in which the dotted line represents the mean value in all the parameters.

Table 2 Statistical description of data set
Fig. 1
figure 1

Feature distribution bar charts

Correlation analysis

The first step in any predictive analytics is data exploration which includes checking for missing values, checking for attribute correlation, and observing the distribution of all features. To build a robust and accurate model, the output feature should be correlated to the input variables. The correlation between the features can be calculated using Pearson’s correlation factor, where a higher absolute value of the correlation factor represents higher dependence of the two variables. Correlation analysis also helps us to remove features that are not correlated with the target-dependent variable. It should be noted that we can remove one of the two highly correlated input features or independent variables as they are considered redundant.

Figure 2 shows the correlation heatmap of all the features. From this figure, it can be seen that the cement, superplasticizer and age are the most three important parameters that are best correlated with the compressive strength with absolute values of 0.5, 0.37 and 0.33, respectively, however, fly ash is the least with absolute value of 0.11. In addition to this, the correlation between the water and superplasticizer shows an absolute value of 0.66, which is very higher since the superplasticizer allows a reduction in water content and increases the strength and workability of concrete.

Fig. 2
figure 2

Pearson correlation between the features represented in the form of a heatmap

Methodology

The machine learning model aims to convert the input independent variable into a target dependent variable by automatically learning a mathematical model. Models are trained on training set samples and are tested on unknown samples called testing sets. A Machine Learning model is said to be “underfitting” when it cannot train or perform well on training set samples. The model is said to be “overfitting” if the performance on the training set is considered good, but it underperforms on unknown samples from the testing set. Machine learning engineers commonly apply regularization strategies to handle overfitting scenarios.

Models for study

In this study, we used twelve Machine Learning (ML) algorithms are used to predict the concrete compressive strength. They are,

  1. 1.

    Ordinary least square is a simple linear regression approach that aims to map the input features to the output target by learning a linear model with n + 1 parameters (where n is the number of features) using a numerical solution.

  2. 2.

    Ridge regression is the regularized linear regression approach that aims to train a generalized regression model that can perform well for both training and testing set samples. Ridge regression applies L2 norm penalty (which is a regularization strategy avoid overfitting model) on the parameters which keeps the model from becoming too much non-linear.

  3. 3.

    Lasso regression is another variant of regularized linear regression where the L1 norm penalty is applied for the regularization.

  4. 4.

    ElasticNet combines the L1 and L2 norm penalties into a linear regression model for more efficient regularization. This model utilizes the best of both norm penalties into an integrated solution.

  5. 5.

    K-nearest neighbour (KNN) is a non-parametric model for both classifications as well as regression. KNN model for regression finds the nearest or similar samples and average the neighbours’ target value as a final prediction. This model works on the assumption that similar samples are most likely to have the same output.

  6. 6.

    Classification and regression trees (CART) is a rule-based machine learning algorithm resembles tree data structure where every node represents a condition and every leaf contains the prediction value.

  7. 7.

    Random forest is a classic example for ensemble learning. It takes advantage of multiple decision trees and their predictions by fusing them into a single prediction which compensates the error of individual regression trees.

  8. 8.

    AdaBoost is another ensemble learning model (similar to random forest) which builds multiple small decision trees (weak learners) with a single split stumps. The stumps are learned gradually by focusing more on samples mistakenly predicted by previous stumps learned.

  9. 9.

    Gradient tree boosting also utilizes the ensembling of decision trees which are weak learners. Gradient boosting technique allows the trees to trees to reconstruct the residual of the initial trees.

  10. 10.

    Xtreme gradient boost is a faster implementation of the gradient boosting technique, which aims to provide scalable, portable and distributed gradient boosting for applications.

  11. 11.

    MLP regressor stands for multilayer perceptron regressor, is a densely connected neural network for regression. Neural networks are known for their superior performance in most computational problems and are used for both classification and regression. Neural Networks mimics human brain by having multiple layers of neurons for learning hierarchical features from the input array. In a neural network, each neuron receives array input and outputs a scalar value called activation.

  12. 12.

    Support vector regression (SVR) is a linear model for regression. Unlike the support vector classification (SVC) which aims to maximize the margin of classes through support vectors, SVR gives flexibility for a regressor by ignoring error made for the support vectors inside the boundary.

Proposed architecture

The flowchart of the proposed methodology is shown in Fig. 3. The proposed architecture consists of three major modules namely (1) dataset pre-processing, (2) model training and evaluation, (3) model Inference using GUI. Detailed information regarding the individual modules is presented here.

  1. 1.

    Dataset Pre-processing: In this module, a dataset is loaded, and randomly split into training and testing sets. The features of the dataset are further scaled into a smaller range.

    • Dataset splitting: Firstly, an unprocessed dataset is checked for missing values, if found then the data entry with the missing value should be either removed or imputed. Here the dataset used in the study has no missing values. The unprocessed dataset has been split into training (80%) and testing (20%) set using random shuffling in which the training set is used to train the models and the testing set is used to evaluate the model performance. The training set contains 834 data entries and the testing set contains 226 data entries. The reason for splitting the dataset is because it tells how the trained model will perform on samples that were not seen by the model yet. After this, a training and testing set is used to train all the algorithms and to evaluate the performance which ensures fair comparison.

    • Feature scaling: In addition, the training set has been standardized and normalized by scaling the dataset and a new version of the training set has been obtained which called as normalized dataset and standardized dataset. The models are trained on this new version of the dataset to determine which scaling if required gives us a higher accuracy in evaluation. The accuracy of an algorithm from the three-training set are compared and the algorithm with appropriate scaling is listed. The top five performing algorithms are then selected for hyperparameter optimization to further improve the accuracy of the model because hyperparameter helps to control the learning process.

  2. 2.

    Model training and evaluation: In this module, the pre-processed features are used for training the machine learning models. We also use feature selection to find the best feature subset.

    • Feature selection: The algorithm with the highest accuracy after optimization is used for the feature selection. Feature selection with just Pearson’s correlation factor is not conclusive so three strategies for feature selection are applied. First feature selection uses the correlation between the input features and the output feature. Removing input feature in increasing order to give an understanding of the importance of features in prediction. Second strategy is to remove the input features according to the correlation in between hence removing highly correlated input features (i.e. more than ± 0.5). High inter-dependence in input features means any one of the inter-dependent features can be used for prediction without affecting the accuracy much and removing a feature decreases the complexity of a model. The third strategy is to remove the feature one by one to train the models and to use each feature to train the model. Third strategy’s model accuracy also give us the importance of each input feature in model prediction. The feature importance of the model can be then compared to the theoretical understanding of concrete ingredients and their effect on compressive strength.

    • Hyperparameter optimization: The feature subset is further used for tuning the hyperparameters of the machine learning models. This step helps us to find the optimal hyperparameter for each machine learning model.

  3. 3.

    Model inference using GUI: In this module, the trained models can be used for inference with new incoming data. Usually, results from multiple machine learning models can be fused into a single prediction using averaging or weighted averaging fusion technique. The real-time data for inference also goes through the same pre-processing steps as the training and testing set samples.

Fig. 3
figure 3

Flowchart of training methodology

Results and discussion

Evaluation measures

In this study, the performance of the prediction model is evaluated by calculating the following measures such as R2, explained variance score (EVS), mean absolute error (MAE), mean squared error (MSE) and maximum residual error (MRE). The expressions of the evaluating measures are shown in Eqs. (14). MRE is the maximum error made by the model among all samples.

$$R^{2} = 1 - \frac{{\sum\nolimits_{i} {\left( {y_{i} - \hat{y}_{i} } \right)^{2} } }}{{\sum\nolimits_{i} {\left( {y_{i} - \overline{y}} \right)^{2} } }}$$
(1)
$${\text{explained}}\,{\text{variance}}\,(y,\hat{y}) = 1 - \frac{{{\text{Var}}(y - \hat{y})}}{{{\text{Var}}(y)}}$$
(2)
$${\text{MSE}} = \sum\limits_{i} {\left( {y_{i} - \hat{y}_{i} } \right)^{2} }$$
(3)
$${\text{MAE}} = \sum\limits_{i} {\left( {y_{i} - \hat{y}_{i} } \right)}$$
(4)

Evaluation measures such as MAE, MSE and MRE represent errors made by the model during prediction whereas other measures R2 score and EVS denote similarity between prediction and target values.

Influence of feature scaling: normalization versus standardization

Comparison of the algorithm starts with exploring and processing the data to make it suitable for training. The model training result with an unprocessed training set is given in Table 3. The result shows that the linear models ordinary least square, regularized linear models ridge, lasso and ElasticNet perform with an accuracy of 0.61 R2. Since the features might have non-linear relationships with the target variables, these linear models do not fit well for this scenario. Support vector regression performs poorly with only 0.54 R2 as SVR is also a kind of linear model. K-nearest neighbour (KNN) algorithm which works best in the dataset, performs better than the linear models with an R2 score of 0.76. Artificial neural network model multilayer perceptron performs well with 0.83 R2 by extracting non-linear features at each layer. The best-performing models for unprocessed concrete compressive strength datasets are decision trees. Adaboost algorithm which is an ensemble boosting decision tree model performs with 0.80 R2 accuracy while classification and regression trees (CART) perform with an accuracy of 0.88 R2. Ensemble models with boosting gradient tree boosting and regularized ensemble model Xtreme tree boosting perform with almost the same accuracy with 0.90 R2 each. Random forest an ensemble model decision trees model with bagging gives an accuracy of 0.91 R2. It can be seen that the tree-based and tree ensemble models outperform linear models for regression. It is also widely known that tree-based machine learning models do not need much pre-processing to be done on the training set.

Table 3 Unprocessed dataset—concrete compressive strength results

Scaling of machine learning models is generally required as the estimators may perform in a subpar manner if it is not done. The standardization features scaling of the dataset scales the features into smaller ranges with zero mean and unit standard deviation. Table 4 shows the result of model training with a standardized dataset. The accuracy of the ordinary least square method and ridge regression does not change much with a standardized dataset at 0.60 R2. KNN also has no improvement with standardization and stays at 0.75 R2. All the decision tree models stay at the same accuracy without any improvement in accuracy. The accuracy of Lasso regression and ElasticNet decrease with accuracy of 0.55 R2 and 0.48 R2. SVR has a minor increase in accuracy with 0.55 R2. A standardized dataset is most helpful in MLP regressor as it increases accuracy to 0.89 R2.

Table 4 Standardized dataset—concrete compressive strength results

Table 5 shows the performance of models on a normalized dataset. The normalization feature scaling technique scales the feature into a 0–1 range using min–max normalization. The performance of almost all models decreases in the normalized training set. Ordinary least squares and KNN perform better in normalized datasets. In Table 5, the ordinary least square performs better with a 0.08 increase in accuracy with 0.68 R2. While KNN’s accuracy increases with 0.02 at 0.77 R2. The reduction in regularized linear models is due to high alpha (learning rate). When the learning rate is decreased for Lasso, Ridge and ElasticNet to an alpha of 10–5, 10–6 and 10–7, respectively, the accuracy is at 0.68 R2. Though further decreasing alpha does not increase the accuracy.

Table 5 Normalized dataset—concrete compressive strength results

Top performing models

The selection of appropriate scaling for models is done based on the accuracy obtained from the three versions of a dataset (unprocessed vs standardized vs normalized). Table 6 lists the models with the highest accuracy for that specific model from the unprocessed, normalized and standardized datasets. The linear models and KNN perform better with a normalized dataset. MLP regressor and SVR improve their accuracy with standardized datasets. The decision tree models are most accurate for concrete compressive strength datasets. Ensemble models with boosting (Gradient Tree Boosting and Xtreme Gradient Boosting) and bagging (Random Forest) perform better with an accuracy of 0.9 R2. Decision trees do not require scaling for model performance is apparent as the scaling of features from − 1 to 1 (standardization) to 0–1 (normalization) decreases the difference between the data entries. The reduced difference between features makes it harder for the regression decision tree to form nodes and set conditions for a split.

Table 6 Model accuracy with appropriate scaling

Hyperparameter optimization

Table 7 shows the performance of the top five models after hyperparameter optimization. For hyperparameter optimization, we have used both a grid-based hyperparameter search approach as well as random hyperparameter search approach. From this, it is observed that Gradient boosted trees algorithm performs very well for concrete compressive strength prediction with an R2 value of 0.94 followed by the Xtreme gradient-boosted trees, random forest, MLP regressor and CART algorithm with an R2 value of 0.93, 0.91, 0.90, and 0.88, respectively. In addition, it is noticed that after hyperparameter optimization, the performance of the top five models is improved. It is due to the fact that hyperparameter optimization identifies a tuple of hyperparameters that yields an optimal model which minimizes a predefined loss function on given independent data (Yuan-Fu, 2019).

Table 7 Top 5 performing model after hyper-parameter optimization

Feature selection

The automated feature selection performed on the Gradient Boosted Tree algorithm show that the most important feature for concrete compressive strength prediction is the age of the concrete and then the cement content. The least important feature is coarse aggregate. Hence, it can be concluded that by using Gradient boosted trees algorithm, contactors at the site can predict the compressive strength of concrete by just providing raw material (such as cement, blast furnace slag, fly ash, water, superplasticizer, coarse aggregate, and fine aggregate) quantity in the input only and get the response (compressive strength).

Conclusion

In this study, twelve prominent Machine Learning algorithms on concrete compressive strength datasets are used and compared. Further, the algorithms experimented with scaled datasets after standardization and normalization to determine which scaling or unprocessed dataset works better for regression. Furthermore, the top five models were hyper-parameter optimized. The linear models performed better with a normalized dataset with 0.68 R2. Though regularized linear models performed with the same performance as ordinary least squares but did not outperform even after decreasing alpha to 10–7. The top five performing models after hyper-parameter optimization were Gradient Boosted Trees with 0.94 R2, Xtreme Gradient Boosted Trees with 0.93 R2, Random Forest with 0.91 R2, MLP Regressor 0.90 R2 and CART with 0.88 R2. Hence, based on the value of R2, it is concluded that Gradient Boosted Trees algorithm can be used to predict the compressive strength of concrete which helps the research community to reduce the cost and time in concrete mix designing and avoid the waste of materials caused by numerous mixture trials.