1 Introduction

Self-compacted concrete (SCC) is a highly viscous concrete; it is a special type that doesn’t need compacting as it flows under its weight without any segregation. The SCC is made with a high amount of cement, less water-to-binder ratio, and utilizes superplasticizers. Okamura first introduced the idea of SCC in 1986, while in 1988, the prototype was first developed by Ozawa at the University of Tokyo [1, 2].

Fly ash and silica fumes are suitable substitutes for cement as cement manufacture is one of the materials that requires intensive energy and thus becomes one of the major sources of greenhouse gases. The global energy demand is rising steadily every day and is expected to rise by approximately 50% by 2040 [3]. SCC has many advantages over conventional concrete, including vibration elimination, reduction of construction time and labor costs, enhancement of durability of concrete, and the decrease of noise [4], as well as, enhancement of the filling capacity of highly congested structural members, improvement of the transitional zone between cement paste and aggregate or reinforcement, and reduction of permeability [5]. An effective SCC should possess three essential characteristics that set it opposed to conventional concrete [6]; (1) Filling capacity, the capacity to flow into the formwork completely under its weight (2) passing ability, the capacity to move through restricted areas between steel reinforcing bars, and (3) Segregation resistance, or the capacity to maintain homogeneity during placement, transport, and storage. Along with good self-compatibility, designed SCC must simultaneously meet the standards for strength, volume stability, and durability of hardened concrete [7].

Due to these clear benefits, SCC has been a research focus for many years. The designed SCC mix should meet the requirements for strength, volume stability, high durability of the hardened concrete, and good self-compatibility [7]. In self-compacted concrete, the shrinkage [8], rheological properties [9], strength [10], and durability [11] have been reported to be significantly impacted by factors such as the composition of the raw materials, the incorporation of chemical and mineral admixtures, aggregate, packing density, water-to-cement ratio (w/c), and design methods.

Despite all the advantages of using concrete, there are negative environmental effects from the current growth in this industry. It is widely believed that cement manufacture, the main component in concrete, releases a high amount of CO2 gas into the atmosphere. This issue can be rectified by completely or partially substituting pozzolanic materials for the cement material [12]. With the addition of 40–60% fly ash, an affordable SCC could be successfully constructed with 28-day compressive strengths ranging from 26 to 48 MPa [13]. Fly ash and superplasticizer are two popular chemical and mineral admixtures that can be used to increase the flowability and stability of SCC [14]. One of the often-used cement substitutes in concrete is fly ash. Due to its rounded shape, it can improve the flowability of the mixture and minimize costs by using less cement [15].

A machine learning technique (MLT) maintains quick access to complex systems, information models, approaches, and algorithms. This technology offers methods that create systems for solving actual issues. Currently, Linear Regression (LR), Multi-Linear regression (ML), Artificial Neural Networks (ANN), support vector machines, and water cycle algorithms are much more accessible. Continuous improvements to these methods significantly impact civil engineering, particularly in the construction and infrastructure industries. Therefore, developing quick and accurate strength property prediction systems is required in the construction industries for pre-design and quality control. These models and algorithms are now more widely available and significantly influence civil engineering. MLTs have been used in numerous research to forecast the strength characteristics of concrete made with fly ash. These studies have served civil engineers in their estimation of numerous aspects of the infrastructure and construction sectors, including project scheduling, quality control, time, and cost [3].

For fly ash (FA)-modified self-compacted concrete (SCC), a variety of machine learning models, such as artificial neural networks (ANN), decision trees, pure-quadratic models, full-quadratic models, interaction, and M5P-tree, are used to predict the compressive strength (CS) and slump flow diameter (SL). Each type demonstrates unique benefits and limitations: Quadratic models are simple but may lack complexity; decision trees offer interpretability but may oversimplify complex relationships; artificial neural networks (ANNs) are excellent at capturing complex patterns but may require large amounts of data and computation. These can be extended to capture more complex relationships using full-quadratic and interaction models, and M5P-trees offer a compromise between interpretability and complexity. By addressing a variety of modeling challenges related to FA-based SCC properties, the application of these diverse models improves prediction robustness and advances the knowledge of the behavior of the material in general [15].

The slump flow diameter (SL) of the SCC is a critical characteristic that needs to be examined in the fresh condition. The compressive strength (CS) of SCC, among the mechanical properties in the hardened state, is also one of the important factors in the design of engineering structures because other mechanical properties and its durability have a direct or indirect relationship with compressive strength and can be derived from CS [16, 17].

The current study creates two databases of fly ash-based self-compacted concrete mixtures with identical parameters. The first database of SCC mixes comprises 305 data samples [18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39] that are used to forecast compressive strength; the second database has 86 data samples [18, 21, 22, 25, 27, 29, 37,38,39,40,41,42,43,44,45,46,47,48,49,50,51] that are used to forecast SCC slump flow diameter. As a result, the dependent parameters, CS and SL, were separately predicted utilizing prepared databases: water-to-binder ratio (w/b), Cement (C), fly ash (FA), sand (S), coarse aggregate (CA), and superplasticizer (SP) represent the independent characteristics of SCC that vary in the range.

This study was mostly about using soft computing models to guess the compressive strength (CS) and slump flow diameter (SL) of self-compacted concrete (SCC) that had different amounts of fly ash (FA) added to it. Here's a breakdown of the study's key aspects:

1.1 Purpose of the study

Context: SCC significantly advances concrete technology because it can self-compact without needing vibration.

Use of Fly Ash: To mitigate CO2 emissions from cement production, fly ash (FA) is employed as a cement replacement in concrete.

To assess the potential of soft computing models in predicting the CS and SL of FA-modified SCC.

Data Collection: Two databases were created, one for CS prediction (303 data points) and another for SL prediction (86 data points), compiling experimental data from previous studies.

Parameters: Both databases shared five independent parameters: water-to-binder ratio, cement, sand, fly ash content, coarse aggregate, and superplasticizer.

Three models (full-quadratic, interaction, and M5P-tree) were established for each database.

Data Division: Each database was split into training (two-thirds) and testing (one-third) sets.

The first database had 202 training data and 101 testing data.

The second database has 57 training data and 29 testing data.

Evaluation Metrics: Various statistical tools (R2, RMSE, SI, MAE, StDev, OBJ, a-20 index, Z-score) were utilized to assess model performance.

Results: The study found that the FQ and IN models demonstrated the highest accuracy and reliability in predicting CS and SL, respectively, for FA-based SCC.

Based on sensitivity analysis, cement content was identified as the most influential contributor to the mixtures.

The study showed that soft computing models can accurately predict the properties of FA-modified SCC. It also showed how important the cement content is to the mixture's properties.

This study is important because it helps improve concrete production by using different materials, like fly ash, and predictive models to speed up the process of making self-compacting concrete with the right properties.

This study addresses the lack of a reliable and precise model for the efficient use of fly ash (FA) in self-compacted concrete (SCC) mixes, a gap identified in the literature due to the versatile applications of FA in SCC formulations. To assess and quantify the impact of various mixture proportions on compressive strength (CS) and slump flow diameter (SL) in SCC, the investigation considered parameters such as water-to-binder ratio, fly ash content (kg/m3), cement content (kg/m3), sand content (kg/m3), coarse aggregate content (kg/m3), and superplasticizer dosage (%). Utilizing databases from the literature, the study employed three distinct model techniques: full-quadratic (FQ) [52, 53], interaction (IN) [54], and M5P-tree [55, 56] models to predict CS and SL in FA-modified SCC. The efficiency of these models is rigorously evaluated using diverse assessment metrics, including correlation coefficient (R2), mean absolute error (MAE), root mean squared error (RMSE), objective value (OBJ), scatter index (SI), and Z-score. This research presents an innovative approach that integrates concrete technology with soft computing methods, contributing to our understanding of the complex relationships governing FA-based SCC properties and offering practical insights for sustainably optimizing concrete mixtures.

2 Objectives

The current research study aims to explore the ability of soft computing models to predict the compressive strength and slump flow diameter of fly ash-modified self-compacted concrete based on measured values from literature; the main goals can be pointed out as the following:

  1. (i)

    To conduct statistical analysis to determine that various concrete components, including cement content, water-to-binder ratio, sand content, coarse aggregate content, and superplasticizer dosage, affect the compressive strength and slump flow diameter of self-compacted concrete made with and without fly ash.

  2. (ii)

    To develop different models (FQ, IN, and M5P-tree), to evaluate and find the most reliable and accurate model in predicting CS and SL of FA-modified SCC.

  3. (iii)

    To present a systematic multiscale model for predicting the compressive strength and slump flow diameter of self-compacted concrete with up to 70% fly ash, with different ranges of water-to-binder ratio, cement content, sand content, coarse aggregate content, as well as superplasticizer dosages.

  4. (iv)

    To ensure the construction industry can apply the developed models without any experimental test works and theoretical restrictions.

  5. (v)

    Additionally, the primary innovation of this study is to offer mathematical models for forecasting the CS and SL of a new composite type, like SCC modified with FA, that the construction industries will utilize.

3 Methodology

In the current study, two different databases were created. The first database is used to predict the compressive strength, and the second is used to predict the slump flow diameter of self-compacted concrete. A total of 303 datasets to predict the compressive strength and 86 data points for predicting the slump flow diameter of fly ash-modified self-compacted concrete are utilized. Firstly, each dataset was separated into two groups: training and testing. The training dataset comprises two-thirds of the total data points, and testing contains one-third [57]. In the database used to predict the compressive strength, the training dataset consisted of 202 data points, while the testing comprised 101 data points.

On the other hand, in the database used to predict the slump flow diameter, the training and testing group consisted of 57 and 29, respectively. Table 1 summarizes the collected data for the first database, which focused on compressive strength, and the summary of collected data for the second database is shown in Table 2. In the collected data, the same parameters with the same unit are considered for both databases, including cement content (kg/m3), water-to-binder ratio, sand content (kg/m3), fly ash content (kg/m3), coarse aggregate content (kg/m3), and superplasticizer percentage.

Table 1 Summary of collected data and statistical analysis for predicting compressive strength of FA-based SCC
Table 2 Summary of collected data and statistical analysis for predicting slump flow diameter of FA-based SCC

This study attempted to determine the most reliable model by analyzing the collected data using different models; three models will be developed. Multiple mathematical operations are carried out as well as analyzing the developed models. Figure 1 contains the flow chart outlining the steps used in this study. Furthermore, the subsequent sections explain and study details such as data collection, analysis, modeling, and evaluation.

Fig. 1
figure 1

Flow chart of the study

4 Statistical evaluation

In the SCC modified with different fly ash content, each parameter was statistically analyzed by various criteria such as Mean, Median, Mode, StDev, Variance, Kurtosis, and skewness, and the maximum and minimum values were also determined. Mean is determined by dividing the sum of all the values in the collection by the total number of values in the data set. When a set of data is arranged in a particular order, the median represents the mid-value of the data. In a data set, the mode is the number that appears the most frequently. StDev is the square root of variance used to calculate StDev, which describes how widely data points deviate from the mean. A measure of data spread and variability, variance is the average squared differences from the mean. The “tailedness” of a distribution is measured by kurtosis, which indicates whether the data have heavy tails (positive kurtosis) or light tails (negative kurtosis) in comparison to a normal distribution. Skewness measures the asymmetry in the distribution of the data. A longer right tail is implied by positive skewness, and a longer left tail is implied by negative skewness relative to the mean.

Tables 1 and 2 include the summary of the statistical evaluation of the first database used to predict the compressive strength and the second database used to predict the slump flow of SCC, respectively.

Figures 2 and 3 provide the histogram of each parameter and the relationship plot between the parameter and the dependent parameter, compressive strength or slump flow diameter, for both the first and second databases, respectively.

Fig. 2
figure 2figure 2

Histogram and Marginal plots between compressive strength and a cement (kg/m3), b water-to-binder ratio, c fly ash (kg/m3), d sand (kg/m3), e coarse aggregate (kg/m3), and f superplasticizer (%) in FA-modified SCC

Fig. 3
figure 3figure 3

Histogram and Marginal plots between slump flow diameter and a cement (kg/m3), b water-to-binder ratio, c fly ash (kg/m3), d sand (kg/m3), e coarse aggregate (kg/m3), and f superplasticizer (%) in FA-based SCC

5 Correlation matrix between independent variables and dependent variable

Matrix computations determine the correlation coefficients between variables, with each cell corresponding to the relationship between the two variables. Zero represents the relationship if there is no relationship between the two variables. If there is a relationship between the two variables, the relationship is represented by the number one, which can be either positive or negative depending on the relationship. Figure 4a shows the relationships between independent parameters and the dependent parameter, and Fig. 4a shows that the relationships between independent parameters and the dependent parameter, which is compressive strength, are quite poor. The compressive strength has the maximum positive correlation with the cement content by 0.607. However, the CS has the maximum negative correlation with water-to-binder by − 0.74. The remaining parameters, FA, S, CA, and SP, correlate with CS by 0.172, 0.099, − 0.287, and 0.164, respectively.

Fig. 4
figure 4

Correlation matrix plot for the coefficient of correlation between the dependent and independent variables based on a CS and b SL database of FA-modified SCC

Figure 4b presents the relationships between the independent variables with the dependent variable, slump flow diameter. The greatest positive correlation of 0.572 between the slump flow and the fly ash content is noted. Meanwhile, the SL has the greatest negative relation with cement content at − 0.814. Furthermore, the SL has correlated with w/b, S, CA, and SP by 0.394, − 0.052, 0.236, and − 0.705, respectively.

6 Models

According to the statistical analysis and figures presented in Sect. 5, as well as based on the R2 value, no direct relationships can be observed between compressive strength or slump flow diameter with other variables of FA-modified SCC mixtures such as cement content, w/b ratio, fine aggregate content, coarse aggregate content, and superplasticizer percentage. Therefore, as reported below, three different models are proposed to evaluate the effect of different mixture proportions mentioned above on the CS and SL of SCC modified with FA.

In this study, the proposed models are used to predict the compressive strength and slump flow diameter of self-compacted concrete. Then the most accurate and reliable one is selected based on different comparisons and assessment criteria. The calculated CS and SL values are compared to the measured values based on the following evaluation criteria: The model should have minimum percentage errors between predicted and experimental data and lower RMSE, MAE, OBJ, SI, and higher R2 values. It should also be scientifically accurate.

The R2, RMSE, and MAE values for each data set, training, and testing are used to determine the accuracy and reliability of each model. The following terms are defined for the notations that are used in the following equations: cement content (C), water-to-binder ratio (w/b), sand content (S), fly ash (FA), coarse aggregate content (CA), superplasticizer (SP) and β0 to n are the model parameter.

6.1 Full quadratic (FQ) model

A full quadratic model is a regression model used to examine the relationship between a dependent variable and one or more independent variables. It is sometimes called a quadratic or a quadratic polynomial regression model. The relationship is considered quadratic, indicating that it follows an equation of a second-degree polynomial. This research study used the FQ model to find the relationship between each compressive strength and slump flow diameter as a dependent variable with independent variables. The mathematical parameters of the interaction model are shown in Eq. 1.

The full quadratic model provides flexibility to represent curved patterns in the data and captures nonlinear interactions. However, it can overfit, which complicates interpretation and requires an accurate balance between model complexity and accuracy.

$$CS, SL = \beta_1 + \beta_2 \left( C \right) + \beta_3 \left( {w/b} \right) + \beta_4 \left( {FA} \right) + \beta_5 \left( S \right) + \beta_6 \left( {CA} \right) + \beta_7 \left( {SP} \right) + \beta_8 \left( C \right)\left( {w/b} \right) + \beta_9 \left( C \right)\left( {FA} \right) + \beta_{10} \left( C \right)\left( S \right) + \beta_{11} \left( C \right)\left( {CA} \right) + \beta_{12} \left( C \right)\left( {SP} \right) + \beta_{13} \left( {w/b} \right)\left( {FA} \right) + \beta_{14} \left( {w/b} \right)\left( S \right) + \beta_{15} \left( {w/b} \right)\left( {CA} \right) + \beta_{16} \left( {w/b} \right)\left( {SP} \right) + \beta_{17} \left( {FA} \right)\left( S \right) + \beta_{18} \left( {FA} \right)\left( {CA} \right) + \beta_{19} \left( {FA} \right)\left( {SP} \right) + \beta_{20} \left( S \right)\left( {CA} \right) + \beta_{21} \left( S \right)\left( {SP} \right) + \beta_{22} \left( {CA} \right)\left( {SP} \right) + \beta_{23} \left( C \right)^2 + \beta_{24} \left( {w/b} \right)^2 + \beta_{25} \left( {FA} \right)^2 + \beta_{26} \left( S \right)^2 + \beta_{27} \left( {CA} \right)^2 + \beta_{28} \left( {SP} \right)^2$$
(1)

where β1 to β28 are defined as the model parameters.

6.2 Interaction (IN) model

An interaction model evaluates the multiplicative impact of two or more independent variables on the dependent variable in statistical modeling. It entails determining whether the quantities or values of another independent variable affect the relationship between the dependent variable and one independent variable, indicating the potential synergistic or adverse effects. The interconnected influences inside the performed system are more clearly recognized. The IN model is utilized to investigate the impact of each independent parameter on the dependent parameters CS and SL. The model consists of a multiple linear regression model. Using an interaction model, Eq. 2 provides the relationship between dependent and independent parameters in the fly ash-modified self-compacted concrete.

$$CS, SL = \beta_1 + \beta_2 \left( C \right) + \beta_3 \left( {w/b} \right) + \beta_4 \left( {FA} \right) + \beta_5 \left( S \right) + \beta_6 \left( {CA} \right) + \beta_7 \left( {SP} \right) + \beta_8 \left( C \right)\left( {w/b} \right) + \beta_9 \left( C \right)\left( {FA} \right) + \beta_{10} \left( C \right)\left( S \right) + \beta_{11} \left( C \right)\left( {CA} \right) + \beta_{12} \left( C \right)\left( {SP} \right) + \beta_{13} \left( {w/b} \right)\left( {FA} \right) + \beta_{14} \left( {w/b} \right)\left( S \right) + \beta_{15} (w/b)\left( {CA} \right) + \beta_{16} \left( {w/b} \right)\left( {SP} \right) + \beta_{17} \left( {FA} \right)\left( S \right) + \beta_{18} \left( {FA} \right)\left( {CA} \right) + \beta_{19} \left( {FA} \right)\left( {SP} \right) + \beta_{20} \left( S \right)\left( {CA} \right) + \beta_{21} \left( S \right)\left( {SP} \right) + \beta_{22} \left( {CA} \right)\left( {SP} \right)$$
(2)

where β1 to β22 are defined as the model parameters.

6.3 M5P-tree model

The compressive strength and slump flow diameter of fly-ash-modified self-compacted concrete were also predicted using the M5P-tree model. The model was first introduced in a study by [58]. The M5P-tree model is a genetic algorithm learner used to address regression issues; it is a hybrid model that combines linear regression and decision trees. It enhances the conventional decision tree model by enabling the association of linear regression models with the tree's leaf nodes. The M5P tree is more adaptable for modeling continuous numerical outcomes since each leaf node has a linear regression equation. The M5P-tree model formula, derived from the training dataset, is given in Eq. 4. It is also applicable for determining the prediction of the testing dataset.

$$CS, SL = \beta_1 + \beta_2 \left( C \right) + \beta_3 \left( {w/b} \right) + \beta_4 \left( {FA} \right) + \beta_5 \left( S \right) + \beta_6 \left( {CA} \right) + \beta_7 \left( {SP} \right)$$
(3)

where β1 to β7 are defined as the model parameters.

7 Assessment criteria

Assessment tools like R2 [59], MAE [60], RMSE [61], SI [62], OBJ [63], a20-index [55], StDev [64], and Z-score [65] are used to evaluate and characterize the created models for training and testing datasets; these tools are well defined in Eqs. 411. R-squared, or the coefficient of determination is a statistical tool used to evaluate the level of agreement or prediction accuracy between predicted and measured values in a regression model. It measures the percentage of the measured value variance that can be accounted for or explained by the model's predictions. The MAE calculates the average size of errors between actual experimental and predicted values. It is a metric frequently used in statistics and machine learning to assess the precision of a prediction model.

The RMSE calculates the average residuals or errors between the predicted and actual observed values. Evaluating the dispersion or spread of these errors assesses how a predictive model or forecasting technique performs.

In addition, SI provides the predicted error percentage for the parameter or the percentage of RMSE difference relative to the mean observation. It is calculated by dividing the RMSE of the data at each grid point by the mean of the observations, which is multiplied by 100.

The OBJ function is a significant assessment tool in regression modeling development. The main objective is often to evaluate the way the predicted values match the measured data. Other assessment tools, such as the a-20 index, can evaluate the developed models. Furthermore, the StDev describes the dispersion of data points that typically deviate from the mean.

$$R^2 = \left( {\frac{{ \sum_i (xi - \overline{x})*(yi - \overline{y})}}{{\sqrt {{\sum_i \left( {xi - \overline{x}} \right)^2 }} *\sqrt {{\sum_i \left( {yi - \overline{y}} \right)^2 }} }}} \right)^2$$
(4)
$$MAE = \frac{{\sum_{i = 1}^n \left( {yi - xi} \right)^2 }}{n}$$
(5)
$$RMSE = \sqrt {{\frac{{\sum_{i = 1}^n (yi - xi)^2 }}{n}}}$$
(6)
$$SI = \frac{RMSE}{{\overline{y} }}$$
(7)
$$OBJ = \left( {\frac{{n_{tr} }}{{n_{all} }} \times \frac{{RMSE_{tr} + MAE_{tr} }}{{R_{tr}^2 + 1}}} \right) + \left( {\frac{{n_{tst} }}{{n_{all} }} \times \frac{{RMSE_{tst} + MAE_{tst} }}{{R_{tst}^2 + 1}}} \right)$$
(8)
$$a20 - index = \frac{N20}{N}$$
(9)
$${\text{StDev}} = \sqrt {{\frac{1}{n - 1}\mathop \sum \limits_{i = 1}^N \left( {xi - \overline{x}} \right)^2 }}$$
(10)
$$Z = \frac{{m - \overline{y}}}{SD}$$
(11)

where xi = predicted value, \(\overline{x}\) = average of predicted values, yi measured value, \(\overline{y}\) = average (mean) of measured values, ntr = number of the training dataset, ntst = number of the testing dataset, nall = total number of training and testing datasets, N = total data, N20 = total number of predicted to the measured data ranging from 0.8 to 1.2. Also, m = represents each predicted data point, SD = sample StDev of measured values, predicted data point, SD = sample StDev of measured values, and Z refers to z-score.

Typically, the R2 and a-20 index values range from 0 to 1, with 1 being the best. The RMSE, MAE, and OBJ values range from 0 to ∞; they should all be as low as possible, with zero being ideal. Furthermore, the model performs well if the SI value is smaller than 0.1. The SI value, on the other hand, falls in the range of (0.1–0.2), (0.2–0.3), and higher than 0.3, correspondingly indicating good, fair, and poor model performance [62]. As the StDev measures the amount of variation or dispersion of the dataset, the value is theoretically ranged between 0 and ∞ higher values, which indicate more deviation from the mean, while 0 denotes no variation (all the data points are the same).

The theoretical range of z-scores is between negative infinity and positive infinity. Practically, based on the distribution of the data, the majority of the z-scores will fall within a specific range. Most z-scores in a normal distribution fall roughly between − 3 and + 3. Any data point with a z-score between − 1 and + 1 is considered normal or common for the dataset. It denotes that the result is within one StDev of the average and near the mean. Outliers are defined as z-scores greater than − 3 or lower than + 3.

8 Results and discussion

8.1 Relation between calculated and actual SCC properties

8.1.1 Full quadratic (FQ) model

The full quadratic model predicted self-compacted concrete's compressive strength and slump flow diameter modified with different fly ash contents. The FQ model consists of advanced mathematical expressions. Therefore, it is one of the most effective models. The model was derived based on linear, variable product terms and interactions, squared variables, and a constant. The following equations, 12 and 13, show the relationship between dependent and independent parameters in predicting compressive strength and slump flow diameter based on the training datasets in each database. The variation of predicted and measured CS and SL for the FQ model is shown in Fig. 5. Figure 5a demonstrates that for the training dataset, R2 is 0.97 and RMSE is 2.57 MPa, whereas for the testing dataset, R2 is 0.83 and RMSE is 8.99 MPa. In this model, the predicted CS data is located between − 15% and 30%.

Fig. 5
figure 5

Relationship between measured and predicted a CS and b SL in LR model using training and testing dataset

Figure 5b presents the relationship between measured and predicted SL of SCC. This figure shows that R2 is 0.80, RMSE is 12.5 mm for the training dataset, and the testing dataset has an R2 of 0.58 and RMSE of 28.4 mm—the FQ model error lines are − 10 to 15%.

$$CS = - 190 + 0.0004\left( C \right) - 0.57\left( \frac{w}{b} \right) - 0.072\left( {FA} \right) + 0.17\left( S \right) + 0.18\left( {CA} \right) - 6.56\left( {SP} \right) - 0.023\left( C \right)\left( \frac{w}{b} \right) + 0.0003\left( C \right)\left( {FA} \right) + 0.00004\left( C \right)\left( S \right) + 0.0001\left( C \right)\left( {CA} \right) + 0.028\left( C \right)\left( {SP} \right) - 0.023\left( \frac{w}{b} \right)\left( {FA} \right) - 0.057\left( \frac{w}{b} \right)\left( S \right) + 0.021\left( \frac{w}{b} \right)\left( {CA} \right) + 0.0005\left( \frac{w}{b} \right)\left( {SP} \right) + 0.0001\left( {FA} \right)\left( S \right) + 0.0001\left( {FA} \right)\left( {CA} \right) - 0.015\left( {FA} \right)\left( {SP} \right) - 0.0001\left( S \right)\left( {CA} \right) - 0.0003\left( S \right)\left( {SP} \right) + 0.001\left( {CA} \right)\left( {SP} \right) - 0.00001\left( C \right)^2 + 0.001\left( \frac{w}{b} \right)^2 + 0.0001\left( {FA} \right)^2 - 0.00002\left( S \right)^2 - 0.0001\left( {CA} \right)^2 + 4.188\left( {SP} \right)^2$$
(12)

No. of training dataset = 202, R2 = 0.97, RMSE = 2.57 MPa

$$SL = 586.2 + 0.0004\left( C \right) - 0.56\left( \frac{w}{b} \right) - 0.028\left( {FA} \right) + 0.14\left( S \right) + 0.238\left( {CA} \right) - 6.5\left( {SP} \right) - 0.021\left( C \right)\left( \frac{w}{b} \right) + 0\left( C \right)\left( {FA} \right) + 0.00003\left( C \right)\left( S \right) - 0.00001\left( C \right)\left( {CA} \right) + 0.027\left( C \right)\left( {SP} \right) - 0.02\left( \frac{w}{b} \right)\left( {FA} \right) - 0.035\left( \frac{w}{b} \right)\left( S \right) + 0.024\left( \frac{w}{b} \right)\left( {CA} \right) + 0.00005\left( \frac{w}{b} \right)\left( {SP} \right) + 0.0006\left( {FA} \right)\left( S \right) + 0.0001\left( {FA} \right)\left( {CA} \right) - 0.018\left( {FA} \right)\left( {SP} \right) - 0.0001\left( S \right)\left( {CA} \right) - 0.0027\left( S \right)\left( {SP} \right) + 0.0005\left( {CA} \right)\left( {SP} \right) - 0.0003\left( C \right)^2 + 0.014\left( \frac{w}{b} \right)^2 + 0.0002\left( {FA} \right)^2 - 0.00003\left( S \right)^2 - 0.0001\left( {CA} \right)^2 - 0.496\left( {SP} \right)^2$$
(13)

No. of training dataset = 57, R2 = 0.80, RMSE = 12.5 mm.

8.1.2 Interaction (IN) model

The compressive strength and slump flow diameter were predicted using the interaction model for self-compacted concrete amended with various fly ash contents. Constant, linear, and variable product terms and interactions were used to derive this model. To forecast compressive strength and slump flow diameter using training data from each database, the following equations demonstrate the relationship between dependent and independent variables. The training database created the following formula using the IN model, Eqs. 14 and 15. The formula was then applied to the testing datasets to ensure their reliability and accuracy.

Figure 6 displays the relation of the predicted and measured CS and SL. As shown in the Fig. 6a, the training dataset has an R2 of 0.96 and RMSE of 2.96 MPa, and the testing dataset has an R2 of 0.86 and RMSE of 8.4 MPa. The model has an error line of − 15% and 30%.

Fig. 6
figure 6

Relationship between measured and predicted a CS and b SL for IN model using training and testing dataset

However, the IN model provided different values in predicted SL as noted in Fig. 6b. The R2 was 0.93 and 0.51, and the RMSE was 7.5 and 29.1 mm for the training and testing dataset, respectively. The model has an error line of − 7 to 15%.

$$CS = - 98.29 + 0.14\left( C \right) - 36.1\left( \frac{w}{b} \right) + 0.072\left( {FA} \right) + 0.068\left( S \right) + 0.02\left( {CA} \right) - 17.46\left( {SP} \right) - 0.006\left( C \right)\left( \frac{w}{b} \right) + 0.00005\left( C \right)\left( {FA} \right) + 0.00002\left( C \right)\left( S \right) + 0.00003\left( C \right)\left( {CA} \right) - 0.0086\left( C \right)\left( {SP} \right) + 0.002\left( \frac{w}{b} \right)\left( {FA} \right) - 0.034\left( \frac{w}{b} \right)\left( S \right) + 0.042\left( \frac{w}{b} \right)\left( {CA} \right) + 0.008\left( \frac{w}{b} \right)\left( {SP} \right) + 0.00001\left( {FA} \right)\left( S \right) + 0.00005\left( {FA} \right)\left( {CA} \right) - 0.003\left( {FA} \right)\left( {SP} \right) + 0.00001\left( S \right)\left( {CA} \right) + 0.01\left( S \right)\left( {SP} \right) + 0.014\left( {CA} \right)\left( {SP} \right)$$
(14)

No. of training dataset = 202, R2 = 0.96, RMSE = 2.96 MPa

$$SL = 759.2 + 0.024\left( C \right) - 0.01\left( \frac{w}{b} \right) + 0.558\left( {FA} \right) - 0.166\left( S \right) + 0.003\left( {CA} \right) - 9.9\left( {SP} \right) - 0.859\left( C \right)\left( \frac{w}{b} \right) - 0.0002\left( C \right)\left( {FA} \right) - 0.00005\left( C \right)\left( S \right) + 0.0003\left( C \right)\left( {CA} \right) + 0.005\left( C \right)\left( {SP} \right) - 0.1\left( \frac{w}{b} \right)\left( {FA} \right) + 0.55\left( \frac{w}{b} \right)\left( S \right) + 0.003\left( \frac{w}{b} \right)\left( {CA} \right) + 13.32\left( \frac{w}{b} \right)\left( {SP} \right) - 0.0001\left( {FA} \right)\left( S \right) - 0.0004\left( {FA} \right)\left( {CA} \right) + 0.00002\left( {FA} \right)\left( {SP} \right) - 0.0001\left( S \right)\left( {CA} \right) + 0.006\left( S \right)\left( {SP} \right) - 0.005\left( {CA} \right)\left( {SP} \right)$$
(15)

No. of training dataset = 57, R2 = 0.93, RMSE = 7.5 mm.

8.1.3 M5P-tree model

The M5P-tree model is the last model utilized to create a prediction for compressive strength and slump flow diameter of self-compacted concrete. Equation 16 and 17 shows that the model is a linear formula. The formula was derived based on the training dataset, but to check the accuracy and reliability of the model, the formula was applied to the testing dataset. As was observed in Eq. 17, the developed model includes cement, water-to-binder ratio, and fly ash as effective parameters on the SL of SCC. However, the sand, coarse aggregate, and superplasticizer were eliminated because they had little or no effect. Figure 7 shows the relationship between measured and predicted CS and SL values. From Fig. 7a, the training dataset has an R2 of 0.965 and RMSE of 2.82 MPa. Whereas the R2 of the testing dataset is 0.89 and RSME is 4.36 MPa. The error line is between − 15% and 20%. As shown in Fig. 7b, in predicting slump flow, the M5P-tree model provided an R2 of 0.86 and RMSE of 10.4 mm for the training dataset and an R2 of 0.58 and RMSE of 23.9 mm for the testing dataset. The model has an error line of − 8 to 10%.

$$CS = - 129.3 + 0.185\left( C \right) - 35.9\left( \frac{w}{b} \right) + 0.13\left( {FA} \right) + 0.068\left( S \right) + 0.067\left( {CA} \right) + 2.76\left( {SP} \right)\quad LM1\left( {\frac{202}{{18.65}}\% } \right)$$
(16)
Fig. 7
figure 7

Relationship between measured and predicted a CS and b SL for M5P-tree model using training and testing dataset

No. of training dataset = 202, R2 = 0.965, RMSE = 2.82 MPa

$$SL = 719.0 - 0.167\left( C \right) + 75.9\left( \frac{w}{b} \right) + 0.087\left( {FA} \right)\quad LM1\left( {\frac{57}{{37.24}}\% } \right)$$
(17)

No. of training dataset = 57, R2 = 0.86, RMSE = 10.4 mm.

8.2 Model comparison

The study attempted to determine the potential of the soft computing model in predicting compressive strength and self-compacted concrete modified with different fly ash content. The effect of fly ash content was also evaluated through different computational models. The experiment includes predicting CS and SL separately using three alternative models: FQ, IN, and M5P-tree. Every model provided a formula that was based on various mathematical parameters. Various assessment criteria were used to evaluate the way each created model performed.

In predicting the compressive strength, based on R2, RMSE, and MAE values, the FQ model has the highest accuracy and reliability using the training dataset, whereas the M5P-tree model was noted as the best for the testing dataset. For the training, FQ model has an R2 of 0.97, RMSE of 2.57 MPa, and MAE of 1.97 MPa. However, the M5P-tree model has an R2 of 0.89, RMSE of 4.36 MPa, and MAE of 2.51 MPa for the testing dataset. In addition, more data are along the Y = X line for the FQ model, which has an error line of − 15 to 30% for the training dataset, with 90% of the data falling between 0.85 and 1.3 (predicted CS/measured CS). Figure 8 demonstrates the statistical criteria outcomes for the developed models.

Fig. 8
figure 8

Performance evaluation of the developed models in predicting compressive strength based on R2, RMSE, and MAE for a training b testing dataset

In predicting slump flow diameter, the interaction model ranked first for the training dataset, but M5P-tree for the testing dataset. In the training, the IN model provides an R2 of 0.93, RMSE of 7.5 mm, and MAE of 5.6 mm, and the M5P-tree has an R2 of 0.58, RMSE of 23.9 mm, and MAE of 21.4 mm for the testing dataset. The IN model has error lines of − 7% to 15% for the training dataset. The statistical results for the constructed models are shown in Fig. 9. Figure 10 compares proposed models based on the testing dataset; model values are within the ± and ± % error lines. − 6 to 13% for SL − 20 to 30.

Fig. 9
figure 9

Performance evaluation of the developed models in predicting slump flow diameter based on R2, RMSE, and MAE for a training b testing dataset

Fig. 10
figure 10

Comparison between measured and predicted a CS and b SL in LR, IN, and M5P-tree models for the testing dataset

The OBJ function was also compared to assess the models developed using the training dataset. Figure 11 shows a pie diagram for the OBJ value in the CS and SL predictions. The lowest OBJ value was observed in the M5P-tree model: 0.22 in CS and 18.98 in SL prediction, indicating the most accurate model. Another evaluation criterion used was the scatter index. In predicting CS, the SI value for all the models was below 0.1 based on the training dataset, indicating an excellent performance of the developed models. The training dataset has SI values of 0.069, 0.079, and 0.075 for the FQ, IN, and M5P-tree models. In predicting SL, however, all the models provided SI values of less than 0.05, as shown in Fig. 12.

Fig. 11
figure 11

Comparison of developed models based on Objection; a CS and b SL

Fig. 12
figure 12

Comparison of developed models based on Scatter Index; a CS and b SL [62]

In addition, the highest a-20 index value was observed for the FQ and M5P-tree model by 100.3% for the training dataset in predicting the compressive strength of SCC modified with FA., Fig. 13. However, all the models maintained an a-20 index of 100.0% in predicting SL based on the training dataset.

Fig. 13
figure 13

Comparison of developed models using a-20 index; a CS and b SL

The created models are compared using the Taylor diagram, as illustrated in Fig. 14, based on the measured and predicted CS and SL StDev and their correlation coefficient (R2). The variation between measured and predicted CS and SL is displayed on a Taylor diagram. The diagram results from Fig. 14a, which contains data from predicting compressive strength, showed that all three models have a high correlation coefficient and are close to the experimental. The StDev of models is very close to the experimental StDev. However, it is noted that the model results for forecasting slump StDev have a different R2 value, as shown in Fig. 14b.

Fig. 14
figure 14

Taylor diagram analysis using StDev and correlation coefficient to assess the proposed models based on; a CS and b SL value of FA-modified SCC

In addition, the proposed models were compared using the Z-a score, as shown in Fig. 15. The Z-score value is useful to determine the data distribution concerning the mean. It is obtained by subtracting the predicted value from the measured mean value and then dividing it by the StDev of the measured data. As shown in Fig. 15a, in predicting compressive strength, the results show that about 67%, 69%, and 68% of the total data points fall between − 1 and + 1 in the FQ, IN, and M5P-tree models, respectively. The proposed models are accurate and reliable in that most of the data are between a range near zero, which is ± 1. Regarding predicting slump flow diameter (Fig. 16), according to the Z-score, most datasets are between − 1 and + 1, which was 77, 81, and 79% for FQ, IN, and M5P-tree models, respectively. This result is also a good indicator of the good performance of the developed models.

Fig. 15
figure 15

Z-score comparison of developed models in predicting compressive strength

Fig. 16
figure 16

Z-score comparison of developed models in predicting slump flow diameter

Z-score values were frequently used to assess the relative position of individual data points within a dataset in terms of standard deviations from the mean when evaluating the performance of a model. Z-scores that lie between − 1 and + 1 are typically regarded as typical because they show that the corresponding predictions closely match the dataset mean. It is important to remember that Z-scores that fall outside of this range do not necessarily indicate that the predictions are inaccurate or unusual. Rather, these deviations might indicate the existence of data points with features that deviate markedly from the average. In our analysis, Z-score values were found in Fig. 16 that were outside of the standard range, indicating situations in which the predicted values significantly differed from the dataset mean. Even though these deviations fall outside of the typical Z-score range, they nonetheless offer important information about how predictions are distributed and whether or not there may be outside influences on the results. As a result, the unique context and features of the dataset under study should be taken into account when interpreting Z-score values that fall outside of the usual range.

8.3 Sensitivity analysis

The sensitivity analysis [66] was carried out to determine the most effective parameters based on the most reliable and accurate proposed model using the training dataset. Since the FQ model was first ranked based on the assessment statistical criteria in predicting the compressive strength, it was used in the sensitivity analysis. One parameter is removed in each round, and run the model. All the R2, RMSE, and MAE values are recorded. As illustrated in Fig. 17a, the cement content has the highest impact on the CS of SCC modified with fly ash. The contribution percentage of the cement content variable is about 25% of the whole of the variables, followed by 19% sand content, 17% coarse aggregate content, 16% fly ash, 14% water-to-binder ratio, and 10% superplasticizer dosage.

Fig. 17
figure 17

The percentage contribution of independent parameters of FA-modified SCC in predicting a CS using the FQ and b SL using IN model

Regarding slump flow diameter prediction (Fig. 17b), the IN model was used in the sensitivity analysis as it performs best compared to other models. Figure 17 shows the contribution percentage of independent variables. The most significant and influential variable on the SL is observed to be cement content by 22%, followed by other independent variables: water-to-binder ratio, sand, superplasticizer by 16%, coarse aggregate, and fly ash content by 15%.

9 Limitations/future works

  1. 1.

    It is suggested to use numerous soft computing models rather than the three models developed in this study to be more accurate and propose the most reliable model for forecasting the CS and SL of FA-modified SCC.

  2. 2.

    The experiments can be carried out to confirm the results of created models.

  3. 3.

    Just 86 data points in the research study were used to forecast the slump flow test. The database can be expanded and consider other SCC parameters as well.

  4. 4.

    The validating group can be added next to the training and testing dataset while the higher data points are used. Therefore, the models can be checked by more datasets.

  5. 5.

    Applying the models discussed to examine the properties of high-strength fiber-reinforced concrete (FRC) with different fiber dosages is one possible avenue for the future development of this work. In the context of high-strength FRC, investigating the predictive powers of FQ, IN, and M5P-tree models may offer insightful information about the intricate interactions between various mixture components and fiber dosages.

10 Conclusions

The current investigation attempted to identify and propose an accurate and dependable model to forecast self-compacted concrete's compressive strength and slump flow diameter, modified with various fly ash types and quantities. From the literature, 303 and 86 data samples for FA-modified SCC were collected. These samples varied in mixture proportions: water-to-binder ratio, cement contents, sand content, coarse aggregate content, and superplasticizer dosages. The following conclusions can be drawn based on the data collected and the output of three model approaches:

  1. 1.

    The fly ash content varies in the two databases; compressive strength (CS) prediction ranges from 0 to 525 kg/m3, and slump flow diameter (SL) prediction ranges from 0 to 468 kg/m3. The median fly ash quantities differ, with 133 kg/m3 in the CS database and 143 kg/m3 in the SL database. Notably, the proportion of fly ash exhibits a range of 0 to 525 kg/m3 in the CS database and 0 to 468 kg/m3 in the SL database, showcasing the variety in the utilization of fly ash for creating self-compacted concrete mixtures in the respective datasets.

  2. 2.

    Based on the statistical tools utilized, such as R2, MAE, and RMSE, the FQ and IN models were found to have the maximum accuracy and reliability for predicting compressive strength and slump flow diameter based on the training dataset, respectively. While the M5P-tree models was the first ranked for both predictions based on the testing dataset.

  3. 3.

    The FQ model has the highest R2 during CS prediction, with values of 0.97 for training and 0.83 for testing datasets. Furthermore, the FQ model training datasets found the lowest RMSE value of 2.57 MPa and MAE value of 1.97 MPa. The IN model has an R2 value of 0.93, RMSE of 7.5 mm, and MAE of 5.6 mm for the training dataset when predicting the SL. However, based on the CS and SL prediction testing dataset, the M5P-tree model provides the highest R2 value and the lowest RMSE and MAE value.

  4. 4.

    Additional statistical evaluation tools, such as the SI value and OBJ function. The lowest OBJ values, 0.22 and 18.98 for CS and SL, respectively, were maintained by the M5P-tree model. Regarding the SI value, the FQ model showed great performance for predicting CS, which was 0.069 for the training and 0.0269 for the testing datasets. As well as the IN model provides the lowest SI value of 0.011 based on the training dataset.

  5. 5.

    According to the Z-score results in predicting CS, about 67%, 69%, and 68% of the total data points fall between − 1 and + 1 in the FQ, IN, and M5P-tree models, respectively. Regarding predicting SL, 77, 81, and 79% of the total data are between − 1 and + 1 for FQ, IN, and M5P-tree models, respectively. This result is also a good indicator of the good performance of the developed models.

  6. 6.

    Sensitivity analysis shows that cement content is the most effective parameter in FA-modified SCC in both CS and SL predictions.