Introduction

The demand and cost of construction materials are increasing due to the world’s rapidly growing population (Shadmani et al. 2018). The ever-growing demand for natural resources to meet the demand of the market and economic growth has resulted in a detrimental impact on the environment and a lack of raw materials (Saberian et al. 2021). In some countries, agricultural wastes/byproducts can be harmful to the ecosystem if these wastes/byproducts are not recycled/pretreated (Sodhi et al. 2021). Either fully or partially use of solid agricultural wastes/byproducts, particularly for concrete production, as a suitable replacement for raw materials has been studied by many researchers (e.g., Aslam et al. 2016a, Chinnu et al. 2021, Rashad 2016, Shafigh et al. 2014b). The reported results showed that this replacement is a promising approach to achieving sustainable development (Islam et al. 2016). Among agricultural wastes/byproducts, oil palm shell (\(OPS\)) (see Fig. 1) that is abundantly available in large quantities in tropical countries such as Indonesia, Malaysia, and Thailand can be easily and efficiently used as a construction material for concrete production (Hamada et al. 2020).

Fig. 1
figure 1

Shape of \(OPS\) aggregate (Mo et al. 2015, 2018)

Every year, around \(4.56\) million tons of \(OPS\) wastes are produced, according to the current statistics (Shafigh et al. 2012b). The advantages of utilizing \(OPS\) waste as a lightweight aggregate (\(LWA\)) in the fabrication of lightweight aggregate concrete (\(LWAC\)) have been reported by many researchers (e.g., Ahmad Zawawi et al. 2020, Aslam et al. 2016c). Moreover, numerous studies have proven that using \(OPS\) decreases the need for coarse material made from natural resources while improving sustainability due to lower pollution levels. The produced structural concrete using \(OPS\) as an \(LWA\) showed to have an acceptable compressive strength (\({f}_{c}\)) at \(28\;days\), and 20–30% lower density (compared with normal weight concrete (\(NWC\))) according to the existing literature (Aslam et al. 2016b; Shafigh et al. 2014c). Furthermore, in terms of flexure and bond strength, the \(OPS\) concrete displays good structural performance (Johnson Alengaram et al. 2011; Teo et al. 2006; Thomas et al. 2017).

The phrase “lightweight concrete (\(LWC\))” refers to concrete that has an oven-dried density of less than \(2000\;kg/m^3\) and can be manufactured from various natural aggregates. The phrase structural \(LWC\), on the other hand, refers to concrete that has an oven-dried density of less than \(2000\;kg/m^3\) and is made using coarse \(LWA\) s with normal fine or fine and coarse \(LWA\) s. A high-strength lightweight aggregate concrete (\(HS-LWAC\)) has a compressive strength of \(34-69\;MPa\), and a dry density of less than \(2000\;kg/m^3\) (Mehta and Monteiro 2014). Generally, to obtain the desirable high-strength in \(LWC\), a water-to-cement (\(W/C\)) ratio of less than \(0.45\) is used to obtain the desirable high-strength in \(LWC\) (Hoff 2002). High-strength concrete (\(HSC\)) with normal weight could generally achieve the cylindrical compressive strength of \(40\;MPa\) and above. It was used in the construction industry (in the year \(1960\)) with a compressive strength of up to \(50\;MPa\). \(HSC\) with normal density, on the other hand, has a compressive strength of above \(41\;MPa\), according to American Concrete Institute reports (American Concrete Institute 1997). According to Mehta and Monteiro (Mehta and Monteiro 2014), a concrete with good quality LWA and a high cement content may reach compressive strengths of \(40\) to \(50\;MPa\).

Previous research has demonstrated that in concrete production, agricultural wastes/byproducts could replace normal coarse aggregate to produce structural \(LWC\) (Alengaram et al. 2013). Shafigh et al. (2011b) proposed utilizing \(OPS\) to make \(HS-LWAC\) by crushing big \(OPS\) shells for performing this process. The reported physical bond between crushed \(OPS\) shell and hydrated cement paste was reported strong and the shell was quite hard. The compressive strength reported in this investigation was around \(53\) and \(56\;MPa\) in \(28\) and \(56\;days\), respectively. Furthermore, it was reported that \(Grade\;30\) \(OPS\) concrete could be manufactured without the use of any cementitious material. Another research demonstrated that \(OPS\) concretes with a \(28-day\) \({f}_{c}\) of around \(43-48 MPa\) and a dry density at around \(1870-1990\;kg/m^3\) can be produced both with and without limestone powder (Alengaram et al. 2013; Shafigh et al. 2014c). To ensure the higher compressive capacity \(OPS-LWAC\), generally, the compressive strength (\({f}_{c}\)) is needed to be evaluated as the determinative factor. As a result, accurate and reliable compressive strength prediction of \(OPS-LWAC\) before using it is critical for making crucial judgments (Zhang et al. 2020).

Nowadays, linear/nonlinear regression techniques are widely used for predicting concrete characteristics (Sadrmomtazi et al. 2019). However, there are few regression models for estimating \(OPS\) concrete compressive strength. Furthermore, utilizing empirical-based models, obtaining an accurate regression equation is quite challenging (Chou and Pham 2013). Among newly developed machine learning (\(ML\)) approaches, gene expression programming (\(GEP\)) (Ferreira 2001), adaptive neuro-fuzzy inference system (\(ANFIS\)) (Jang 1993), and artificial neural network (\(ANN\)) (Hornik et al. 1989) have been widely employed to formulate the conventional statistical methods/models (i.e., regressions) (e.g., Farooq et al. 2021, Latif 2021a). In order to distinguish the relationship between input factors and \(HS-OPS-LWAC\)’s \({f}_{c}\), therefore, in this study, the above-mentioned \(ML\) approaches are used.

To the best of the authors’ knowledge, no \(ML\)-based model exists for estimating/predicting the \(HS-OPS-LWAC\) compressive strength. Therefore, this study uses a comprehensive database collected from the literature to predict the compressive strength of the HS − OPS − LWAC, by employing ANN, ANFIS, and GEP approaches. After that, the employed approaches’ efficiency, performance, and predictive validity are compared using multiple statistical approaches. In the “Research methodology” section, the employed \(ML\) and regression methods will be explained. The “Modeling procedure” section is about compiling the collected dataset, and in the “Results and discussion” section, the modeling procedure is further described. Evaluating and comparing the efficiency and performance of the proposed models are further discussed in the “Conclusions” section.

Research methodology

In this section, first, the data collected on \(HS-LWAC\) mix designs were explained and descriptive and statistical information about this data was given. In the following, comprehensive explanations were given about the used regression and \(ML\) methods including \(MLR\), \(GEP\), \(ANFIS\), and \(ANN\).

Data collection

To predict the compressive strength of \(HS-LWAC\), a dataset including \(229\) experimental data records was compiled from previous research studies (Alengaram et al. 2008a, b; Aslam et al. 2015, 2016b, c, 2017, 2018; Farahani et al. 2017a, b; Maghfouri et al. 2017, 2018, 2020; Muthusamy et al. 2020; Shafigh et al. 2011a, b, 2012a, b, c, 2013a, b, 2014a, c, 2016, 2018; Yahaghi et al. 2016). This dataset included information such as the content of fine aggregate (\(Sand\)), natural coarse aggregate (\(Gravel\)), ordinary Portland cement (\(OPC\)), fly ash (\(FA\)), silica fume (\(SF\)), superplasticizer (\(SP\)), and \(OPS\), as well as the water-to-binder (\(W/B\)) ratio. It also included the age and compressive strength (\({f}_{c}\)) values of the test specimens. Since the aim of this study was the prediction of the compressive strength of \(HS-LWAC\), the \(28-day\) \({f}_{c}\) of all specimens were higher than \(34 MPa\). The modeling input and output variables histograms are displayed in Fig. 2.

Fig. 2
figure 2

Histograms of the database parameters

In Table 1, the descriptive statistics of the input and output variables are provided. As can be seen in this table, for specimens with ages of \(1\) to \(120\;days\), the compressive strength varies from \(13.71\) to \(84.45\;MPa\). Of all the specimens, \(76\) contained no gravel, which allowed the investigation of the effect of the \(100\%\) substitution of \(OPS\). The highest content of \(OPS\) (i.e., \(451.5\;kg/m^3\)) decreased the specific gravity of the concrete to \(1900\;kg/m^3\) (see Table 1). However, in designs without \(OPS\) where the coarse aggregate was entirely gravel, the specific gravity values were greater than \(2228\;kg/m^3\). In \(22\) mix designs, in addition to \(OPC\), \(FA\) was also used as the binder. The highest content of \(FA\) was \(165\;kg/m^3\), and the \(OPC\) content in this design was \(385\;kg/m^3\). In addition, the lowest content of \(FA\) among these \(22\) designs was \(22.85\;kg/m^3\), and the \(OPC\) content in the corresponding design was \(388\;kg/m^3\). The use of \(FA\) slightly increased the \(28-day\) \({f}_{c}\). On the other hand, it had a positive effect on the drying shrinkage of concrete and improved concrete durability (Mo et al. 2020). Moreover, in \(37\) mix designs, \(SF\) was also used as partial replacement of cement, with the highest content of \(60\;kg/m^3\) and lowest content of \(45.7\;kg/m^3\). The mix designs containing \(SF\) had higher compressive strength values compared with those without \(SF\). The reason for this may be the fineness of \(SF\) particles and the reaction of silicon dioxide with calcium hydroxide.

Table 1 Descriptive statistics of the input and output variables

Multiple linear regression approach

Regression approaches predict how a dependent variable varies by changing independent variable(s). Multiple linear regression (\(MLR\)) (Andrews 1974), often known as multiple regression, is an approach that statically predicts the result of a response variable by combining multiple explanatory variables/parameters. Since \(MLR\) approach contains more than one explanatory variable (independent), multiple regression is essentially an ordinary least-squares (\(OLS\)) regression extension that can be expressed as follows:

$$y={\beta }_{0}+{\beta }_{1}{x}_{1}+\cdots +{\beta }_{n}{x}_{n}+\varepsilon$$
(1)

where \(y\) is the dependent variable predicted value; \({\beta }_{0}\) is the value of \(y\)-intercept (\(y\) value considering all other parameters are set to zero); \({\beta }_{1}\) and \({\beta }_{n}\) are the regression coefficient of the first and last independent variable, respectively; \({x}_{1}\) and \({x}_{n}\) are the first and last independent variable, respectively; and \(\varepsilon\) is the model error. \(MLR\) approaches calculate three factors to obtain the best-fit line for each independent (explanatory) variable including (1) the regression method coefficients that lead to the least overall model error, (2) the entire model’s \(t\)-statistic, and (3) the \(p\)-value that corresponds to the entire model’s \(t\)-statistic. The model’s \(t\)-statistic and \(p\)-value are then calculated for each regression coefficient.

Gene expression programming approach

Gene expression programming (\(GEP\)) method (Ferreira 2001), based on Darwin’s theory of evolution and Mendel’s genetic theory, is one of the most logically appealing computational intelligence formalisms. There are two languages in \(GEP\) algorithms including the gene and the expression trees (\(ET\) s) languages. Comprehending one of these languages requires knowledge of the sequence/structure of the other (Ferreira 2002). The following are the basic processes involved in standard/typical \(GEP\) modeling. \(GEP\) modeling starts with a random chromosome’s generation for specific numbers, followed by the introduction of the chromosomes using \(Karva\) language (i.e., representing symbols). A chromosome or gene usually has a head and a tail; the chromosome’s head composed of some terminal symbols or a function, whereas only terminal symbols form the chromosome’s tail (Shishegaran et al. 2020). In a \(GEP\) model, the sub-\(ET\) s’ number is determined by the head size, which takes into account each parameter’s complexity. The lengths of the chromosomes are fixed and may be easily converted/transformed into an algebraic equation, as seen in Fig. 3.

Fig. 3
figure 3

A typical \(GEP\) model, the algebraic equation, and its corresponding \(ET\) with phenotype, along with the crossover and mutation processes

Each \(GEP\) gene has a collection of terms (i.e., a fixed-length list) that are adapted from the function set, including arithmetic operations (\(+,\;-,\;\times,\;\div\)), and functions such as \(Boolean\) logic (\(AND\), \(OR\), \(NOT\), etc.), mathematical (\(cos\), \(sin\), \(ln\)), conditional (\(IF\), \(THEN\), \(ELSE\)), and so on. The chromosomes are then represented by \(ET\) s that come in various sizes and shapes. The major genetic operators of crossover, transposition, mutation, and recombination (one-point, two-point, and gene recombination) are then conducted on the chromosomes, in line with their ratios (Londhe et al. 2021). The process of mutation and crossover and a typical \(ET\) are displayed in Fig. 3. It is also worth noting that the \(ET\) is represented in \(Karva\) notation/\(K\)-expression. Reaching a suitable solution or highest/enough generation number (the stop condition), the whole process will stop. If the maximum iteration or preferred fitness value termination requirements are not fulfilled, the Roulette wheel method, ranking/tournament selection, elite strategy, etc., is used. This procedure would be repeated until the optimal/best solution was found or for a defined generation number.

Adaptive neuro-fuzzy inference system approach

Adaptive neuro-fuzzy inference system (\(ANFIS\)) (Jang 1993) is an appealing computational intelligence modeling technique that combines the \(ANN\) learning capability with the fuzzy logic reasoning capability. \(ANFIS\) has a better estimate ability and is a better alternative for processing nonlinear complicated problems more precisely (Gholizadeh et al. 2022). \(ANFIS\) algorithms learn from the collected data for training with any complicated mathematical model, then maps out the obtained solutions onto a fuzzy inference system (\(FIS\)) (Saradar et al. 2020).

Using \(ANFIS\) tool in \(MATLAB\), a typical \(FIS\) consists of many phases, one of which is the introduction of inputs to aid in fuzzy sets fuzzification according to the linguistic rules activation. Following that, particular rules/guidelines are either created by specialists or can be derived from numerical data available in the literature. The next stage is inference, which involves mapping fuzzy sets according to set rules. Finally, the fuzzy sets are defuzzified, resulting in the final output values. In other words, the \(ANFIS\) technique is made up of five key steps: (\(1\)) dataset; (\(2\)) development of \(ANFIS\); (\(3\)) variable setup; (\(4\)) training and then validation; (\(5\)) obtaining results. In addition, the architecture of \(ANFIS\) for the nine input variables (\(OPC\), \(FA\), \(SF\), \(W/B\), \(SP\), \(Sand\), \(Gravel\), \(OPS\), and \(Age\)) is shown in Fig. 4. More detail regarding the method and development of \(ANFIS\) can be found in Mohammadi Golafshani et al. (2021).

Fig. 4
figure 4

The employed \(ANFIS\) schematic with the defined parametric conjunction operations

Artificial neural network approach

Artificial neural networks (\(ANN\) s) (Hornik et al. 1989) are computer algorithms that can accurately and effectively forecast and categorize data processing difficulties. They are mathematical models based on the properties of biological neuron networks that are similar to the human brain (Liu et al. 2021). \(ANN\) s have a layered structure with a variety of processing elements (\(PE\) s) and arranged nodes, including (1) an input layer which composed of independent variables, (2) a hidden layer/s which is composed of several hidden variables, also known as hidden neurons, and (3) an output layer which contains the outputs/target values (Ahmed et al. 2022) (see Fig. 5).

Fig. 5
figure 5

The architecture of the used feed-forward \(ANN\) with nine inputs

The influential factors in the research were chosen as inputs to produce the respective outputs compressive strength of concrete (\({f}_{c}\)), as shown in Fig. 5. Each input from the preceding layer (\(OPC\), \(FA\), \(SF\), \(W/B\), \(SP\), \(Sand\), \(Gravel\), \(OPS\), \(Age\)) is multiplied by an appropriate weight factor (weight connection) in the hidden layer. A threshold value is added to each node’s weighted input signals summations. The combined input then goes through a transfer phase that includes a non-linear transfer function (\(TF\)) (Latif 2021b).

Linear, stepped, logistic sigmoid, and hyperbolic tangent sigmoid are the most widely employed activation functions (\(AF\) s) in \(ANN\) s. The output of one \(PE\) serves as the input for the subsequent \(PE\). Each neuron in the hidden and output layers performs a logistic function as an \(AF\) (Parsaie et al. 2021). \(AF\) is a crucial essential property of neural networks, and it has a substantial influence on the \(ANN\) model performance and efficiency; therefore, choosing and employing the viable and workable \(AF\) is critical (Ehteram et al. 2021). In this study, to increase the performance and accuracy of the obtained output, \(AF\) of Backpropagation neural network (\(BPNN\)) and \(PURELIN\) are employed. \(BPNN\)’s output is within the range of \(-1\) to \(+1\) and is related to a bipolar Sigmoid which is employed in the hidden layer. \(PURELIN\) is a linear \(AF\) which is employed in the output layer. The number of neurons in each layer and each \(TF\) increases as a result of using these \(AF\) s. Therefore, for the training dataset, using \(BPNN\) and \(PURELIN\) improves the statistical indices; however, it decreases the accuracy for testing the dataset and validation (Ghadami et al. 2021).

The training/learning phase begins when the \(ANN\) starts propagating the collected data (information) from the input layer, and the weight factors (connections) are modified according to the specified rules for finding the best combination of weights to create the least amount of error possible (Shahmansouri et al. 2021). The trained model is then verified using a new testing set. More detail regarding the \(ANN\) approach and its development can be found in Shahmansouri et al. (2022).

Modeling procedure

To model the compressive strength of \(HS-LWAC\), nine input variables introduced and characterized in the previous section, together with an output variable being the compressive strength, were considered. The total number of experimental data points used in the modeling was \(229\) in all the methods.

Data curation

For \(ANN\) and \(ANFIS\) modeling approaches, considering the input and output domains’ differences, all input variables were normalized to increase the accuracy and speed of the models (Shahmansouri et al. 2022). To this end, using Eq. (2), input variables were normalized in the range of \(0.1-0.9\).

$${x}_{i}=0.8\frac{x-{x}_{min}}{{x}_{max}-{x}_{min}}+0.1$$
(2)

where \(x\) is the measured value of a parameter and \({x}_{i}\) is the normalized value. In addition, \({x}_{min}\) and \({x}_{max}\) are the minimum and maximum values of variable \(x\) in the data. Note that since, in the \(GEP\) modeling, the effect of weight is considered, there is no need to normalize the data. It should be mentioned that in the \(MLR\) method, normalizing the data had a negative effect and lowered the model’s performance.

Performance parameters

The developed models’ performance was assessed using parameters including \(RMSE\), \(MAE\), and \({R}^{2}\) through the following equations:

$$RMSE=\sqrt{\frac{1}{n}\sum\limits_{i=1}^{n}{\left({t}_{i}-{o}_{i}\right)}^{2}}$$
(3)
$$MAE=\frac{1}{n}\sum\limits_{i=1}^{n}\left|{t}_{i}-{o}_{i}\right|$$
(4)
$${R}^{2}=\frac{{\left(n\sum {t}_{i}{o}_{i}-\sum {t}_{i}\sum {o}_{i}\right)}^{2}}{\left(n\sum {{t}_{i}}^{2}-{\left(\sum {t}_{i}\right)}^{2}\right)\left(n\sum {{o}_{i}}^{2}-{\left(\sum {o}_{i}\right)}^{2}\right)}$$
(5)

where \(t\) is the target value, \(o\) is the output value, \(n\) is the total number of data points, and \(\overline{t }\) is the mean value of targets.

In addition to high correlation, a model should present an acceptable error to be reliable. To this end, the parameter \(OBJ\) was used to compare the performances of different models. This parameter is a function of \({R}^{2}\) value, and two errors \(RMSE\) and \(MAE\) in all modeling phases using the following equation:

$$OBJ= \left(\frac{{n}_{tr}}{{n}_{all}}\frac{{RMSE}_{tr}+{MAE}_{tr}}{{R}_{tr}^{2}+1}\right)+\left(\frac{{n}_{val}}{{n}_{all}}\frac{{RMSE}_{val}+{MAE}_{val}}{{R}_{val}^{2}+1}\right)+\left(\frac{{n}_{tst}}{{n}_{all}}\frac{{RMSE}_{tst}+{MAE}_{tst}}{{R}_{tst}^{2}+1}\right)$$
(6)

where \(n\) is the number of patterns (data points) in the associated dataset, and \(tr\), \(val\), and \(tst\) subscripts present the training, validation, and testing datasets, respectively.

MLR model

In this study, to predict the compressive strength of \(HS-LWAC\) using \(MLR\) method, \(70\%\) of the data were used as the training data and the remaining \(30\%\) as the testing data. The equation used in \(MLR\) to predict the compressive strength is as follows.

$${f}_{c}=43.18+(0.031245*OPC)-(0.02493*FA)-(0.38695*SF)-(11.0925*\frac{W}{B})+(0.010443*SP)-(0.0221*Sand)+(0.025972*Gravel)-(0.00926*OPS)+(0.190928*Age)$$
(7)

Here, modeling using the \(MLR\) method was performed in \(CurveExpert Professional\) software. \(RMSE\), \(MAE\), and \({R}^{2}\) values for the testing dataset were obtained as \(9.83 MPa\), \(7.71\), and \(0.59\). Furthermore, the value of \(OBJ\) per all the data was \(8.89\). The results show that the model developed using the \(MLR\) was not a relatively accurate prediction of the compressive strength of \(HS-LWAC\).

GEP model

For an acceptable performance of the \(GEP\) modeling, it is necessary to set its associated parameters. \(70\%\) of the collected data were used as the training data and \(30\%\) as the validation data. Twenty different designs with different parameters were used to select the best \(GEP\) settings using the recommendations provided by Shahmansouri et al. (2020). Results corresponding to the performance parameters of the 20 developed models are displayed in Table 2. \(OBJ\) values obtained from the performance analyses of the developed models are displayed in Fig. 6, and model \(GEP18\) was selected as the best model with the lowest \(OBJ\) value of \(4.98\).

Table 2 Statistical parameters for each \(GEP\) model
Fig. 6
figure 6

The \(OBJ\) values for all \(GEP\) models (red bar is the best)

\(GEP\) setting parameters used for developing \(GEP18\) are reported in Table 3, and the equation obtained from this model is as follows:

Table 3 \(GEP\) setting parameters used for \(GEP18\)
$${f}_{c}=f\left(OPC, FA, SF, {~}^{W}\!\left/ \!{~}_{B}\right., SP, Sand, Gravel, OPS, Age\right)={\mathrm{ET}}_{1}+{\mathrm{ET}}_{2}+{\mathrm{ET}}_{3}+{\mathrm{ET}}_{4}$$
(8)
$${ET}_{1}=\left(Gravel-\left(\mathrm{sin}SP-5.793\right)*Gravel\right)*{e}^{-5.793}$$
(8a)
$${ET}_{2}={\mathrm{tan}}^{-1}\left(Gravel-{e}^{{\mathrm{tan}}^{-1}\left(Gravel-OPC\right)}+OPS-OPS*SF+Sand\right)$$
(8b)
$${ET}_{3}=-5.793*\left(W/B+\left(W/B-5.793\right)*{\mathrm{tan}}^{-1}Age+{\mathrm{tan}}^{-1}\left(SF*Sand\right)\right)$$
(8c)
$${ET}_{4}={\mathrm{tan}}^{-1}\left(Gravel-{\mathrm{tan}}^{-1}{e}^{FA}*\left(Sand-OPS+5.793\right)-OPS*SF*Gravel\right)$$
(8d)

This model was then used to compare the performance of \(GEP\) method with other modeling methods. The expression trees of the \(GEP18\) model are shown in Fig. 7.

Fig. 7
figure 7

\(GEP18\)’s expression trees

ANFIS model

In the \(ANFIS\) method, \(70\%\) of the data were used for training, and \(30\%\) were used for testing the model. For all the models, the initial \(FIS\) was generated using fuzzy c-means (\(FCM\)) clustering method and then fine-tuned by employing a hybrid optimization algorithm (Pouresmaeil et al. 2022). In this method, the number of clusters first needs to be determined. To this end, all unknown model parameters (i.e., membership functions’ nonlinear parameters and linear equations’ coefficient parameters in the output of the rules) have a sum lower than the total number of observations (number of data utilized in the training phase). In this study, the number of clusters was considered \(2\) to \(6\), and \(ANFIS\) models were labeled \(C2-C6\), considering the number of clusters. After that, for models with \(2\), \(3\), \(4\), and \(5\) clusters, \(56\), \(84\), \(112\), and \(140\) unknown parameters were considered, respectively. In the \(C6\) model, the sum of unknown parameters is \(168\), which is larger than the total number of observations or the number of training data (i.e., \(160\)), that makes it unreliable. The modeling of each \(ANFIS\) structure was repeated \(20\) times, given the random nature of optimization problems, and the best result was saved. Further information about the \(ANFIS\) models can be seen in Table 4.

Table 4 Specifications for selecting reliable models

The best-developed models’ performance parameters’ values in both the training and testing phases are provided in Table 5.

Table 5 Statistical parameters for each \(ANFIS\) model

The \(OBJ\) values of the best-developed trained model per a given cluster number are given in Fig. 8. Among the \(ANFIS\) models, the model with \(5\) clusters and an \(OBJ\) value of \(2.59\) was selected as the best model and compared with other methods. In addition, it is seen that with decreasing the number of clusters, \(OBJ\) increases, indicating a weaker performance of the model.

Fig. 8
figure 8

The \(OBJ\) values for all \(ANFIS\) models (red bar is the best)

ANN model

In the \(ANN\) method, randomly, \(70\%\) of the data were used to train, \(15\%\) to test, and \(15\%\) to validate the model. For all the models, the Levenberg–Marquardt backpropagation algorithm was used for the network training. In the \(ANN\) modeling, it is necessary to specify the number of hidden layers and their neurons (Faraj et al. 2022). One hidden layer was selected for all the developed models based on the authors’ experience. The number of hidden layers’ neurons was selected from \(6\) (two-thirds of the number of input variables) to \(27\) (three times the number of input variables). Furthermore, the developed ANN models were named \(n6-n27\), considering the number of neurons. Note that the number of neurons in the input and output layers equals the number of input and output variables (i.e., 9 and 1), respectively. The modeling of each \(ANN\) structure was repeated \(20\) times, and the best result was saved; in total, \(440\) \(ANN\) models were built. Transfer functions (\(TF\) s) in the hidden and output layers were of the hyperbolic tangent sigmoid type and the linear type, respectively.

The best-developed models’ performance parameters’ values in terms of neurons’ numbers are provided in Table 6 for the training, testing, and validation phases. In addition, Fig. 9 presents the \(OBJ\) values of the best-constructed model per a given number of neurons for each trained model. Among all the developed models, the neural network model with \(17\) neurons (with the lowest \(OBJ\) value of \(1.65\)) was selected as the best model and further used for comparison with other modeling methods in this study.

Table 6 Statistical parameters for each \(ANN\) model
Fig. 9
figure 9

The \(OBJ\) values for all \(ANN\) models (red bar is the best)

Results and discussion

Table 7 lists the values of performance parameters for the best-constructed models using the four employed methods. Before assessing and comparing the models, it must be ensured that no overfitting occurred in the modeling. Overfitting is a common issue in modeling using \(ML\)-based methods and occurs when the performance values are acceptable for the training data while they are significantly weaker for the testing data. Overfitting can be detected by comparing the four aforementioned performance parameters in the training and testing phases. As the difference in the performance parameters between the training and testing phases declines, the probability of overfitting decreases.

Table 7 Statistical parameters of different models

Considering the values reported in Table 7, no overfitting occurred in modeling using the four methods of interest. A higher \({R}^{2}\) value indicates a strong correlation between the experimental and prediction data of the models. As can be seen, \(MLR\) and \(GEP\) have weaker performances compared with \(ANN\) and \(ANFIS\) in both training and testing phases. In general, \(ANN\) has the best correlation with an excellent value of \(0.982\), followed by \(ANFIS\) with a correlation value of \(0.964\). Figure 10 shows the performance parameters schematically to allow better comparison. \(ANN\) had the lowest \(OBJ\) values in both training and testing phases and thus showed the best performance in predicting the compressive strength. After that, \(ANFIS\) and \(GEP\) respectively showed the next best performances, and \(MLR\) with an \(OBJ\) value of \(8.89\) had the weakest performance.

Fig. 10
figure 10

Comparison between developed models in training, testing, and overall phases

The predicted \({f}_{c}\) for the training and testing datasets using the best models of the four described methods against the experimental \({f}_{c}\) are displayed in Fig. 11. As is shown in this figure, the linear regression equation with bias zero is also provided. Lines representing the main lines’ \(10\%\;\mathrm{and}\;20\%\;\mathrm e\mathrm r\mathrm r\mathrm o\mathrm r\mathrm s\) are also drawn in the diagrams. In the \(ANN\) modeling, except for five points, all the other points have errors lower than \(20\%\), which correspond to compressive strength values lower than \(50\;MPa\). For \({f}_{c}\) higher than \(50 MPa\), all the points have errors lower than \(20\%\), with only four points with errors higher than \(10\%\). This observation indicates that the \(ANN\) model has an excellent performance in predicting \({f}_{c}\) of higher strength concrete. The \(ANFIS\) model also had a proper performance in predicting the compressive strength at higher strength values and performed slightly weaker at lower strength values. The \(GEP\) and \(MLR\) methods, however, had weaker performances compared with \(ANN\) and \(ANFIS\) methods.

Fig. 11
figure 11

Predicted \({f}_{c}\) versus experimental results for training and testing dataset

As is shown in Fig. 12, the predicted values using different models are compared with the experimental results. According to this figure, the predicted values by \(ANN\) are considerably close to the experimental results. In contrast, \(MLR\) and \(GEP\) models cannot satisfactorily predict the compressive, considering the reported experimental results.

Fig. 12
figure 12

Graphical comparison of \(ANN\), \(ANFIS\), \(GEP\), and \(MLR\) models in the overall phase

For further investigation of the four developed models’ performance, the predicted to experimental value ratio is illustrated in Fig. 13. This ratio is another criterion for demonstrating the models’ ability to lower errors and provide a more accurate prediction. The lower the scattering of this ratio, the higher accuracy of the developed models becomes. As can be seen, the \(ANN\) model showed a better performance than the other models in both the training and testing phases. The mean \(Pre/Exp\) ratios of all the data for models \(ANN\), \(ANFIS\), \(GEP\), and \(MLR\) are \(0.998\), \(1.012\), \(1.024\), and \(1.064\), respectively. Moreover, the lowest difference between the mean ratio and \(1\) for the \(ANN\) model demonstrates the better performance of this model. For this model, the minimum and maximum \(Pre/Exp\) ratios are \(0.716\) and \(1.324\), respectively. The worst performance pertains to \(MLR\), with the minimum and maximum \(Pre/Exp\) values of \(0.506\) and \(2.083\), respectively.

Fig. 13
figure 13

Comparison of \(Pre/Exp\) ratios using \(ANN\), \(ANFIS\), \(GEP\), and \(MLR\) models in train and test phases

Figure 14 shows the error values of the developed models in the testing phase. As can be seen, the mean error values of the \(GEP\) and \(ANN\) models are \(0.03\) and \(0.28\), respectively, which are much lower than \(ANFIS\) and \(MLR\) models’ errors (i.e., \(0.63\) and \(2.14\), respectively). In addition, in the \(ANN\) model, the first and third quartiles are \(-1.85\) and \(2.51\), respectively, indicating an interquartile range of \(4.36\). The corresponding values for models \(ANFIS\), \(GEP\), and \(MLR\) are \(5.16\), \(8.04\), and \(13.44\), respectively. A smaller interquartile range indicates greater concentration and lowers the scattering of the error data. The value of this parameter in \(ANN\) is \(15.57\), \(45.77\), and \(67.56\%\) lower than the corresponding values in \(ANFIS\), \(GEP\), and \(MLR\), respectively.

Fig. 14
figure 14

Error box plot diagram of models in the testing phase

The uncertainty technique inspired by Monte Carlo simulation (MCS) was employed to specify the randomness of the developed models. The prediction of the compressive strength is associated with several uncertainties (e.g., experimental uncertainty, input predictors uncertainty, and model parameters uncertainty) (Ashrafian et al. 2022). The MCS analysis was conducted for the MLR, GEP, ANFIS, and ANN models. The results of this study (e.g., median of predicted \({f}_{c}\), mean absolute deviation (MAD), and width of uncertainty band) are reported in Table 8. According to the table, the positive values of the average prediction error show that the \({f}_{c}\) predicted using all approaches above are higher than the experimental values. Also, the ANN and MLR presented the lowest (20.370%) and highest (38.154%) uncertainty bandwidths, respectively.

Table 8 MCS uncertainty analysis of the proposed models

Conclusions

Replacement of natural coarse aggregate with agricultural wastes/byproducts such as \(OPS\) in the \(LWC\) production process can reduce environmental impact and promote sustainable development. Precise prediction of \(OPS-LWAC\) compressive strength is a determinative factor in decision-making before the concrete field placement. The research aims to investigate if different \(ML\) and regression approaches can be used to predict \(HS-OPS-LWAC\)’s \({f}_{c}\). To this end, a relatively comprehensive dataset is used to develop three models, including \(GEP\), \(ANFIS\), and \(ANN\). After that, the developed models’ performance is compared to the results obtained from the regression model (\(MLR\)). The following conclusions can be drawn from the investigation’s research results:

  • According to the research results, all \(ML\) approaches were effectively employed to develop prediction models for the \(HS-OPS-LWAC\) compressive strength.

  • The suggested \(ML\) models outperform statistical evaluation indices such as \(MAE\), \(RMSE\), \({R}^{2}\), and \(OBJ\), indicating the models’ excellent abilities and potential for further/future practical application.

  • The calculated correlation coefficient (\({R}^{2}\)) for the training, testing, and validating phases of all developed models (i.e., \(GEP\), \(ANFIS\), and \(ANN\)) was greater than \(0.8\), indicating a good fit between model predictions and experimental data.

  • The \(ANN\)-based model with \(17\) neurons with the \(OBJ\) value of \(1.65\) outperformed all developed models. Furthermore, the \(ANN\)-based models demonstrate better efficiency and performance than the developed \(ANFIS\)-based and \(GEP\)-based models.

  • The uncertainty analysis was performed via Monte Carlo simulation (MCS) to specify the randomness of the developed models. The results show the positive values of the average prediction error. Moreover, the ANN model presented the lowest (20.370%) uncertainty bandwidth.

The findings of this study have opened up new avenues for future research using \(ML\) algorithms. To improve the suggested approaches’ generalizability, the authors will gather a continuously updated, widely accessible, and more comprehensive database in future work. To replace missing values in the database (the input and output), advanced data pre-processing approaches such as semi-supervised learning and missing data imputation will be employed. Other \(ML\) approaches’ effectiveness in forecasting \(HS-OPS-LWAC\)’s \({f}_{c}\) will also be compared. The integrated hybrid \(ML\) model, which combines \(ML\)-based techniques with high-convergence metaheuristic optimization algorithms (e.g., Seydanlou et al. 2022, Shaswat 2021), might be studied as a feasible option to increase the concrete properties’ estimation accuracy (e.g., modulus of elasticity, compressive, tensile, and flexural strengths). Finally, the suggested \(ML\)-based model will be integrated into construction industry systems to make \(HS-OPS-LWAC\) easier to produce. However, further study in this area is necessary.