1 Introduction

Historically, concrete has been the most prevalent building material globally (Rob & Noel, 2003). Its extensive application in construction projects around the world has cemented its status as a reliable and essential component of human infrastructure. Since the era of the Roman Empire, concrete has played a crucial role in the development and enhancement of various cultures. The "magic" of concrete lies not only in its transformative properties from a fluid to a solid state but also in its affordability, wide availability, malleability, plasticity, adaptability, high compressive strength, stiffness, and durability (Falliano et al., 2018; Rahmat & Mousavi, 2012; Rob & Noel, 2003; Seung-Chang 2003). Researchers continue to explore this specialized field of concrete engineering with enthusiasm, revealing promising prospects for innovative materials such as reactive powder concrete. The compressive strength of concrete typically ranges from 200 to 800 MPa (Falliano et al., 2018; Rahmat & Mousavi, 2012; Seung-Chang 2003), underscoring its versatility and potential for future advancements.

It is standard practice to obtain a sufficiently large sample of a concrete mixture to infer conclusions about the entire batch. According to references (Gagg, 2014; Rana et al., 2022; Shi et al., 2015), concrete testing at day 7 is crucial for assessing early-age strength, although concrete reaches its full strength at 28 days, necessitating a similar duration for sample examination. The 28-day test results serve as a basis for predicting the compressive strength of concrete over time. These procedures align with established design standards, yet these design standards may prove inadequate when dealing with concrete compositions that extend beyond conventional cement, aggregate, and water. The complexity of calculating the compressive strength of a concrete mix increases when alternative components such as pozzolans and admixtures are introduced, especially in high-strength concrete formulations (Ashish & Verma, 2019; Danish & Ganesh, 2021; EFNARC, 2002; Esmaeilkhanian et al., 2014; Surya et al., 2020). Factors like the influence of admixtures and temperature can significantly impact compressive strength outcomes. In such cases, empirical relationships suggested by standards often fail to capture the full complexity, necessitating the development of reliable models to promptly assess concrete strength at various ages, crucial for meeting urgent project deadlines that cannot wait for traditional 28-day test results. The time saved from waiting for test outcomes translates into tangible economic benefits, given the productivity-driven nature of construction processes. Therefore, exploring alternative approaches becomes imperative. There remains considerable scope for debate and exploration regarding the efficacy of these non-traditional methodologies.

There has been considerable interest in the application of artificial neural networks (ANNs) for predicting concrete compressive strength in recent years (Bayer et al., 2019; Kandiri et al., 2020). Various studies have explored hybrid models such as fuzzy logic (FL), genetic programming (GP), and ANN to forecast the influence of ground granulated blast furnace slag on concrete strength over time (Golafshani et al., 2015). The cascade correlation type of ANN has enabled rapid learning albeit with moderate accuracy, effectively capturing the nonlinear patterns inherent in concrete properties (Gesoǧlu et al., 2010; Öztaş et al., 2006; Siddique et al., 2011). This capability is particularly advantageous for problems like compressive strength measurement, offering savings in time and cost. The combination of artificial neural networks with meta-heuristic algorithms has shown promising results in addressing complex challenges in structural engineering. For instance, hybrid multilayer perceptron (HMP) networks have been beneficial for determining concrete compressive strength in scenarios such as deep beams connected to shear walls (Behnood & Golafshani, 2018; Behnood et al., 2017; Chakravarthy et al., 2023; Kovačević, M., Lozančić, S., Nyarko, E.K., Hadzima-nyarko & M., 2021; Siddique et al., 2011). Artificial ANNs and evolutionary search methods like genetic algorithms (GAs) are integrated in the context of Evolutionary Artificial Neural Networks (EANNs), as elucidated in references (Behnood et al., 2017; Chakravarthy et al., 2023). Reference (Kovačević, M., Lozančić, S., Nyarko, E.K., Hadzima-nyarko & M., 2021) lays out the framework for EANNs. These networks have proven highly efficacious in detecting structural defects and predicting compressive strength in concrete (Aiyer et al., 2014; Golafshani et al., 2020a; Kovačević, M., Lozančić, S., Nyarko, E.K., Hadzima-nyarko & M., 2021). Consequently, EANNs are being investigated as a potentially cost-effective alternative to both expensive mathematical models and destructive testing methods for accurately estimating concrete compressive strength.

2 Methodology

This review focuses on exploring machine learning models and their application to predicting the compressive strength of self-compacting concrete. To achieve this objective, a comprehensive literature review was conducted utilizing various scholarly databases and journals such as Scholarly Journals, Elsevier, Springers, SCOPUS, the Web of Science, Turkish Journal of Engineering, IEEE, and Science Direct. These sources were selected based on their relevance and contribution to the research topic. A total of 81 relevant sources were identified and analyzed. Figure 1 illustrates the research methodology adopted in this study, outlining the systematic approach followed in sourcing and reviewing literature.

Fig. 1
figure 1

Flow chart of the approach used in obtaining relevant articles for the research

A total of 80 articles sourced from academic journals constituted a significant portion of the scholarly references analyzed in this review. In addition to these articles, the EFNARC guidelines and specifications for self-compacting concrete were also utilized, bringing the total number of academic sources to 81. This review employs modern construction technology to systematically evaluate the compressive strength of self-compacting concrete through the application of machine learning models.

3 Results and discussions

3.1 Predicting concrete compressive and flexural strength

There are numerous techniques for predicting compressive strength as adopted by researchers across the world.

3.1.1 Empirical methods

In an effort to faithfully represent experimental procedures within a specified context, empirical methodologies have been developed. The significance of these methodologies lies in their ability to quantify the contribution of each component to the final outcome. The established relationships must align with experimental data, necessitating comprehensive empirical analysis. In the context of forecasting compressive strength, many empirical models endeavor to correlate strength with the water-cement ratio, despite its typically adverse impact on predicted strength. Alternatively, some models incorporate existing compressive strength data and employ empirically derived coefficients to establish a link between available and required information (Danish & Ganesh, 2021; Sonebi 2004).

3.1.2 Computational modelling

Complex equations in thermodynamics serve as the fundamental basis for finite element analysis, a computational modeling technique widely employed in engineering and materials science. This approach crucially depends on accurate representations of concrete microstructures. To effectively simulate processes such as hydration across various particle sizes within cement using pixel-based computer simulations, it is imperative to randomly distribute cement within a unit cell space. Additionally, empirical encoding of these processes can be pursued following adjustments based on experimental data (Danish & Ganesh, 2021; Jovic et al., 2019; Sonebi 2004).

3.1.3 Mechanical modeling

In mechanical models, parameters are commonly conceptualized using a spring-and-dashpot structure. In this analogy, the cement matrix corresponds to the spring, while the temporal factor (age at the time of testing) represents the dashpot. This theoretical framework is applied to predict the compression strength of concrete. However, when this model is tested against experimental data, its predictive accuracy declines. Specifically, the dashpot component notably influences compressive strength, particularly at early ages (Danish & Ganesh, 2021; Jovic et al., 2019; Sonebi 2004).

3.1.4 Statistical methods

To ensure precise delineation of interrelationships among variables, statistical methodologies leverage empirical data and employ mathematical formulations. In the realm of statistics, multilinear regression is by far the most used approach. Although statistical approaches are intuitive and simple to implement, their data-heavy nature might be a hindrance in some situations. Also, their effectiveness varies with the mathematical function used to fit the data, making them less reliable than alternative approaches (Danish & Ganesh, 2021; Günal & Mehdi, 2023; Jovic et al., 2019; Sonebi 2004).

3.1.5 Regression analysis

Regression analysis is widely acknowledged as an essential component of robust statistical modeling strategies. Despite its utility in computing coefficients that quantify efficiency gains (Behnood & Golafshani, 2020; Madani et al., 2020; Zhou et al., 2016), the approach is often intricate and challenging to follow (Behnood & Golafshani, 2020; Madani et al., 2020; Zhou et al., 2016). Notwithstanding these complexities, establishing confidence intervals for predictions necessitates rigorous mathematical underpinnings. Correlation analysis further elucidates how key variables influence the final outcome. R2, sometimes termed the correlation coefficient, is a metric used to compare the effectiveness of different regression equations. R2 measures how well a model can make predictions. It's a metric for assessing the model's ability to account for differences in output that result from differences in input values. A value of 0 indicates that the regression model fails to interpret any variance in Y, whereas a value of 1 indicates that all points are on the regression line. Moreover, some findings from slump tests and concrete density were presented to complement the mix proportions of constituent materials, aiding in the estimation of compressive strength in high-performance concrete (Behnood & Golafshani, 2020; Madani et al., 2020; Zhou et al., 2016).

The review aimed to enhance the accuracy of test results across different age groups to better reflect the progression from weakness to strength. The final multi-variable power equation was refined by incorporating additional independent variables, achieving a correlation coefficient of 99.99%. Furthermore, significant correlations between predicted compressive strength and experimental data were identified, as documented in the literature (Chiew et al., 2017). Standardization of model inputs using a fixed matrix formula was implemented to optimize the formulation of concrete mixtures. Regression methods have traditionally been favored due to their simplicity and efficiency in modeling, particularly in scenarios where non-linear relationships between reactants and products are minimal, such as in concrete applications. This effectiveness is highlighted in previous studies (Sergio & Mauro, 1997; Seung-Chang, 2003), which compared regression methods with artificial neural networks and concluded that the former offer superior predictive capabilities.

3.1.6 Artificial intelligence

According to Webster's New World College Dictionary, artificial intelligence (AI) is defined as "the ability of machines or programs to function in ways that simulate human intelligence in tasks such as reasoning and learning." This definition prompts an inquiry into the types of problems that might necessitate a computational approach that emulates human cognitive processes. Examples of such problems include knowledge-based inference with partial or unclear data, various forms of perception and learning, as well as tasks involving control, prediction, classification, and optimization (Erzin, 2007; Syed et al., 2023). These challenges provide contexts in civil engineering where AI techniques can be employed to replicate phenomena whose underlying processes are not fully understood. Neural networks and genetic algorithms represent two distinct yet conceptually similar AI techniques. Neural networks are designed to mimic the brain's cognitive processes, whereas genetic algorithms are inspired by the principles of natural selection and 'mutation' to enhance performance. Additionally, Adaptive Network-based Fuzzy Inference Systems (ANFIS) and fuzzy systems are two further examples of modern artificial intelligence. ANFIS integrates fuzzy logic and neural networks to leverage the strengths of both approaches, utilizing linguistic interpretations of variables and adaptive learning processes to generate effective models (Erzin, 2007; Syed et al., 2023). The effectiveness of AI-based methods lies in their ability to learn from data and produce relevant models for a broad array of applications. However, achieving results that approximate real-world conditions often necessitates a data set that is both extensive and high-quality.

3.2 Fuzzy logic

Lotfi Zadeh is credited with pioneering the field of fuzzy logic (FL), as documented in seminal works (Duan et al., 2013; Sergio & Mauro, 1997; Silva et al., 2021; Syed et al., 2023). His contributions, including the development of fuzzy inference systems such as Mamdani, Takagi, and Sugeno, fundamentally transformed methods for knowledge representation.

Unlike the binary nature of traditional Boolean logic, which strictly adheres to values of "completely false" (0) or "completely true" (1), fuzzy logic introduces a nuanced approach based on degrees of membership (Duan et al., 2013; Erzin, 2007; Sergio & Mauro, 1997; Silva et al., 2021; Syed et al., 2023). In contrast to the rigid boundaries of Boolean logic, where information is represented in black-and-white terms, fuzzy logic acknowledges shades of gray and accommodates the inherent uncertainties in real-world data. An individual may, for instance, possess data on the heights of 10 persons in a dataset, categorized into groups: 'tall' individuals ranging from 1.75 m to 2.20 m and 'short' individuals ranging from 1.50 m to 1.74 m. In such scenarios, the use of crisp sets may appear inadequate. Scholars (Duan et al., 2013; Erzin, 2007; Sergio & Mauro, 1997; Syed et al., 2023) have criticized this dichotomous approach as “overly simplistic” and often “out of touch with reality.” Fuzzy sets offer a more precise and robust method to represent such data. According to Sergio and Mauro (1997); Syed et al., 2023; Erzin, 2007; Duan et al., 2013), a fuzzy set is defined as a class of objects with degrees of membership that form a continuum. Here, a membership function assigns each item in the set a score between one and zero, reflecting its degree of membership in the set.

Modelling non-linear systems and designing sophisticated controllers are two pivotal applications of fuzzy logic control (FLC), a robust mathematical framework. In concrete, each constituent component serves a distinct purpose, individually and collectively. This interplay leads to nonlinearity in the relationship between constituent quantities and resulting compressive strength. FLC mitigates this nonlinearity by its application in scenarios where the system's complexity precludes the use of traditional modeling approaches (Duan et al., 2013; Erzin, 2007; Sergio & Mauro, 1997; Syed et al., 2023).

4 Neural Network

The transformative evolution of neural networks in problem-solving approaches is currently underway. Since the inception of artificial intelligence in the 1950s, neural networks have spearheaded endeavors to augment robots' functionalities beyond physical labor to intellectual tasks. The term "artificial neural network" encompasses various conceptualizations, analogous to biological brain networks. Here, axons represent outputs and synapses signify weights. In artificial neural networks, the ubiquitous neuron is also referred to as a "processing element." According to Silva et al. (2021), Duan et al. (2013), Alade et al. (2018), and Iqbal et al. (2020), artificial neural networks are succinctly as a class of massively parallel designs that address complex issues by coordinating the efforts of several, relatively uncomplicated processors (or "artificial neurons"). A perceptron exemplifies a simple neural network with a direct input-to-output mapping. In contrast, more intricate neural networks feature multiple layers and employ diverse activation functions. Classification of artificial neural networks sheds light on diverse structures and functionalities.

Damage detection, structural systems identification, material behavior modelling, structural settlement analysis, control and optimization strategies, groundwater monitoring, and determination of concrete mix proportions are just some of the many areas where ANNs have been put to use (Saridemir, 2010). In a study referenced by Gandomi and Roke (2015), ANNs were explored within prediction models alongside fuzzy logic to forecast the compressive strength of self-compacting concrete. Their ANN model, featuring a hidden layer comprising 6 neurons, underwent 500 iterations. Comparable success rates were achieved, although discrepancies in measurement accuracy persisted. Notably, the ANN model exhibited superior performance with an R2 value of 0.9767 compared to the fuzzy logic approach. Subsequent refinement involving an ANN configuration with eleven neurons in a single hidden layer, and later with a two-hidden-layer setup (nine neurons in the first layer and eight in the second), resulted in further accuracy improvements, achieving the lowest absolute percentage error (= 0.000515) (Khan et al., 2021). Reference (Faradonbeh et al., 2018) also considered the advantages of a multi-layer architecture. Material considerations are integral to this model, evident in the utilization of chemical analysis data for fly ash, gradient, and sand compositions. The model achieved R2 value of 0.9557 for compressive strength and 0.9119 for flexural tensile strength, indicating that the prediction model performed well. The influence of water-to-binder ratio on compressive strength was also analyzed, revealing a decrease as the ratio increased, thereby influencing the compressive strength forecasting model (Faradonbeh et al., 2018; Khandelwal et al., 2017). Their model demonstrated high accuracy with an R2 of 0.9944 in reproducing experimental outcomes and 0.9767 for predicting testing samples. Comparative analyses of concrete strength prediction methodologies, including artificial neural network (ANN) approaches (Ferreira, 2002, 2006; Gholampour et al., 2017), highlight the ANN's superior performance over multiple linear regression models, especially for low and medium strength concretes.

4.1 Genetic programming

In order to evaluate a computer's performance executing a specific task, genetic programmers employ a framework called Gene Expression Programming (GEP), which utilizes a set of instructions and a fitness evaluation process. GEP operates akin to a subtype of genetic algorithms (GA), where each node represents a segment of code executed on a computer. This methodology aims to enhance program efficiency by strategically relocating its components to optimal physical positions based on predefined conditions that must be met. Previous studies have identified three key genetic manipulations (Faradonbeh et al., 2018; Ferreira, 2002, 2006; Khandelwal et al., 2017).

Some critical adjustment factors are elucidated in articles (Faradonbeh et al., 2018; Ferreira, 2002, 2006; Khandelwal et al., 2017). The genetic programming (GP) building process begins with an initial population size of 49 without cement/FA substitution and 27 with a cement/FA substitution rate set at 0.15. he curing period serves to further categorize each dataset within the GP model, mirroring its role in the ANN model. Specifically targeting a 28-day compressive strength outcome, the GP model incorporates four primary input parameters: water, cement, coarse aggregate, and fine aggregate.

Reference (Ferreira, 2002, 2006; Khandelwal et al., 2017) presented regression equations for predicting the in situ concrete compressive strength. Their models were derived from comprehensive datasets encompassing ready-mixed concrete mixture compositions and corresponding on-site compressive strength assessments. The study utilized 1,442 compressive strength tests across 68 distinct concrete mix designs, characterized by specified compressive strengths ranging from 18 to 27 MPa, water to cement ratios varying between 0.39 and 0.62, and aggregate maximum sizes spanning 25 to 100 mm. Additionally, references (Ferreira, 2002, 2006; Gholampour et al., 2017) tested a proposed model for predicting the compressive strength of concrete using in situ data.

4.2 Parameters in machine learning models

Hyperparameters in machine learning are external settings that guide decision-making and influence the learning process of algorithms. Machine learning engineers must configure these hyperparameters before training the algorithm (Asteris et al., 2021; Mehmannavaz et al., 2014; Neira et al., 2020). Hyperparameters encompass various elements such as the learning rate, the number of clusters in a clustering algorithm, and the number of branches in a regression tree. During the training process, the algorithm adjusts its internal parameters—such as weights and biases—based on the hyperparameters and the training data. These internal parameters are refined through training to achieve optimal model performance. Ideally, the final model parameters should fit the data set precisely without underfitting or overfitting (Asteris et al., 2021; Mehmannavaz et al., 2014; Neira et al., 2020).

A hyperparameter in the context of machine learning refers to a parameter that governs the learning process itself, distinct from the parameters whose values are determined through training, typically node weights. These hyperparameters fall into two main categories: model hyper parameters, which pertain to the model selection task and cannot be inferred during machine learning, and algorithm hyper parameters, which impact the speed and caliber of learning but theoretically have no effect on the model's performance. The topology and size of a neural network are two instances of model hyperparameters (Asteris et al., 2021; Mehmannavaz et al., 2014; Neira et al., 2020).

Hyperparameters of an algorithm include learning rate, batch size, and mini-batch size. A mini-batch size denotes a smaller sample set, whereas batch size can refer to the entire data sample. While some straightforward methods (like ordinary least squares regression) require none, other model training procedures require different hyperparameters. The training algorithm extracts parameters from the data and applies them to these hyperparameters. The selection of a model's hyperparameters can affect how long it takes to train and test. Typically, a hyperparameter is of an integer or continuous form, leading to challenges in mixed-type optimization. Some hyperparameters may only exist contingent upon the values of others, for example, in a neural network, the number of layers can affect the size of each hidden layer (Asteris et al., 2021; Mehmannavaz et al., 2014; Neira et al., 2020).

4.3 Untrainable parameters

Hyperparameters play a crucial role in enhancing a model’s capacity, yet their aggressive tuning may lead to suboptimal outcomes by driving the loss function towards undesired minima. This phenomenon, known as overfitting, occurs when the model begins to excessively capture noise in the data rather than faithfully representing its underlying structure. Consequently, the learning process of these hyperparameters from the training set may be hindered. For example, consider the degree of a polynomial equation used in a regression model. If this degree were treated as a trainable parameter, the model could potentially adjust it to perfectly fit the training data, thereby minimizing training error. However, such an approach often sacrifices generalization performance, as the model becomes overly specialized to the training dataset and fails to generalize well to unseen data.

4.4 Tuning in ML

Only a few hyperparameters account for the performance variation of machine learning models. The extent to which adjusting an algorithm, hyperparameter, or combination thereof enhances performance is referred to as its tuning capability. Among the various hyperparameters, the most critical for machine learning models are the learning rate and network. In contrast, empirical evidence suggests that batch size and momentum exert negligible influence on model performance (Asteris et al., 2021; Mehmannavaz et al., 2014; Neira et al., 2020). Research has shown that mini-batch sizes ranging from 2 to 32 generally yield optimal results, despite occasional arguments in favor of larger mini-batch sizes in the thousands.

4.5 Robustness of ML models

Learning processes inherently exhibit stochasticity, implying that performance evaluations based on empirical hyperparameters may not faithfully represent true performance. Without considerable simplification and robustness, methods that are not resistant to straightforward modifications in hyper parameters, random seeds, or even various implementations of the same algorithm, cannot be incorporated into mission-critical control systems. In particular, algorithms for reinforcement learning must have their performance evaluated across a large number of random seeds and their sensitivity to hyper parameter selection evaluated. Because of the high variance, evaluating these algorithms with a limited number of random seeds inadequately captures their performance dynamics (Asteris et al., 2021; Mehmannavaz et al., 2014; Neira et al., 2020). Certain reinforcement learning techniques like Deep Deterministic Policy Gradient (DDPG) are more responsive to selections of hyper parameters than others.

4.6 Optimization

Through the use of hyperparameter optimization, a tuple of hyperparameters is found to produce an optimal model on test data that minimizes a predetermined loss function. The objective function computes and returns the associated loss upon receiving the set of hyperparameters.

4.7 Reproduction in ML models

In addition to fine-tuning hyperparameters, machine learning entails reproducibility checks, parameter and result organization, and storage. Without robust infrastructure, research code often undergoes frequent modifications, risking essential aspects such as reproducibility and record-keeping. Online collaboration platforms for machine learning facilitate seamless exchange, organization, and communication of experiments, data, and algorithms among scientists. Reproducing deep learning models, in particular, presents notable challenges (Khan, 2012).

4.8 Creating machine learning models

Machine learning models are developed using various approaches that leverage labeled, unlabeled, or a combination of both types of data (Khan, 2012; Onyelowe et al., 2021). There are four main machine learning algorithms available:

  1. 1.

    Supervised learning: This method involves training algorithms on labeled data, where each data point is associated with a known label. The labels serve as a guide for the algorithm to learn to classify new data accurately. Supervised learning ensures that the resulting model aligns with the intended classification objectives set by researchers.

  2. 2.

    Unsupervised learning: Algorithms under unsupervised learning operate on unlabeled data. Without predefined labels, these algorithms autonomously identify patterns and structures within the data. This approach is particularly useful for researchers seeking to uncover hidden patterns and structures in datasets.

  3. 3.

    Semi-supervised learning: This technique trains an algorithm by combining labeled and unlabeled data. Initially, the algorithm is trained on a small set of labeled data to establish a foundational understanding. Subsequently, it utilizes a larger pool of unlabeled data to further refine its learning and improve model performance.

  4. 4.

    Reinforcement learning: In reinforcement learning, algorithms learn through interaction with an environment where they receive feedback in the form of rewards or penalties. The goal is to maximize cumulative rewards over time by selecting actions that lead to positive outcomes while avoiding negative ones. Reinforcement learning can guide unsupervised machine learning by providing incentives for discovering beneficial patterns.

4.9 Types of machine learning models

In the realm of machine learning, problems can generally be categorized into two main types: classification and prediction. These categories represent fundamental challenges addressed by machine learning algorithms. Depending on the nature of the problem, algorithms are tailored to develop models that are either classifiers or predictors (Prasad et al., 2019). Notably, some algorithms can be trained to produce models suitable for both regression (used for predictive modeling) and classification tasks. A variety of well-established algorithms are employed for constructing regression and classification models, as listed below:

Classification Models:

  1. 1.

    Support Vector Machines.

  2. 2.

    Random Forests.

  3. 3.

    Decision Trees.

  4. 4.

    Logistic Regression.

  5. 5.

    Naive Bayes.

  6. 6.

    K-Nearest Neighbor (KNN).

Regression Models:

  1. 1.

    K-Nearest Neighbor (KNN) Regression.

  2. 2.

    Decision Trees.

  3. 3.

    Random Forests.

  4. 4.

    Neural Network Regression.

  5. 5.

    Linear Regression.

4.10 Overfitting in machine learning (ML)

In machine learning (ML), overfitting denotes a condition where a model exhibits diminished performance on new, unseen data due to excessive alignment with the training data from which it was derived. The primary objective is to enhance the model's ability to generalize to novel data instances, thereby ensuring its applicability beyond the specific dataset used for training (Dias & Pooliyadda, 2001; Nguyen et al., 2019). Overfitting manifests commonly in tasks such as image recognition and natural language processing. Upon mitigating overfitting, these models can achieve improved precision in predicting new data and provide dependable forecasts for practical applications. Overfitting arises due to several factors: insufficient data, inclusion of extraneous information in the dataset, prolonged training on a specific dataset, and excessive model complexity. In the context of ML, overfitting describes a scenario where the model becomes overly intricate and closely adheres to idiosyncrasies within the training data. Mitigating overfitting aims primarily at enhancing the model's generalization capability to handle unseen data effectively. Techniques for identifying and addressing overfitting include monitoring all loss metrics, scrutinizing the learning curve, integrating regularization techniques, conducting cross-validation, and visually inspecting predictions to verify alignment with the training data.

Various strategies are available to prevent overfitting in machine learning, including dropout, feature selection, early stopping, cross-validation, regularization (such as L1 and L2), and data augmentation. When a machine learning model becomes excessively attuned to the nuances of the training data, impeding its ability to generalize effectively to new data instances, it is said to suffer from overfitting. This phenomenon is particularly noticeable in neural networks, where the model assigns undue significance to incidental details in the training dataset. Addressing overfitting is crucial for ensuring the model's accuracy in making predictions about new data, distinguishing between essential patterns and irrelevant noise.

4.11 Causes of overfitting

The training dataset utilized by the model exhibits considerable noise and imperfections. Moreover, the dataset's size is insufficient; utilizing only a fraction of the available data leads to an incomplete representation of the entire dataset. Consequently, when evaluating decision trees for overfitting through validation metrics such as accuracy and loss, one can discern instances of overfitting. Overfitting causes these metrics to typically ascend until reaching a peak, after which they commence a decline or reach a plateau. Furthermore, as the model strives to achieve an optimal fit, these metrics stabilize or decrease. Addressing this issue necessitates striking a delicate balance between model complexity and the volume of available training data.

4.12 Identifying overfitting in ML models

Overfitting can be identified through various indicators, as outlined below:

  1. 1.

    Monitoring all losses is crucial; notable is when the validation loss increases while the training loss decreases.

  2. 2.

    Close attention should be paid to the machine learning curve, as any noticeable divergence between the training and validation curves suggests overfitting. Incorporating a regularization term into the loss function is essential to mitigate overfitting.

  3. 3.

    Visual inspection of the model's predictions is indispensable for assessing whether the model excessively conforms to the training data, which serves as a direct indicator of overfitting.

4.13 Preventing ML from overfitting

Several methods are available to mitigate overfitting in machine learning models, as outlined below:

  1. 1.

    Regularization techniques such as L1 and L2 are employed by augmenting the loss function with penalty terms. These techniques discourage the model from excessively tailoring itself to the training data, thereby mitigating overfitting.

  2. 2.

    Cross-validation serves as a pivotal method to prevent overfitting. By employing techniques such as early stopping, training halts just before the onset of overfitting, thereby ensuring that the model generalizes well to unseen data.

  3. 3.

    Another effective approach is data augmentation, which enriches the training dataset by generating synthetic data from existing samples. By exposing the model to a diverse array of data instances, data augmentation enhances its robustness against overfitting.

  4. 4.

    Feature selection reduces model complexity by excluding irrelevant or noisy features, thereby preventing overfitting to noisy data.

  5. 5.

    The dropout technique forces the model to learn more robust and generalizable representations, thus preventing data overfitting.

Implementing these techniques collectively diminishes the overfitting problem, resulting in the development of more precise and reliable machine learning models.

4.14 Fuzzy logic approach

The four main components of each fuzzy system are Fuzzy rule base, Fuzzification, Defuzzification and Inference Engine (Golafshani et al., 2020b). Figure 2 presents a general fuzzy logic model architecture.

Fig. 2
figure 2

Fuzzy Logic Architecture

Fuzzification is the first step in which fuzzy inputs are assessed and transformed into one or more fuzzy sets. Two primary types of fuzzification commonly employed are Gaussian and trapezoidal. Within the framework of fuzzy logic, any element may belong to multiple subsets of the universal set. Fundamentally, fuzzy rules are structured as linguistic IF–THEN statements, employing the general format "IF A THEN B," where A and B are schemes incorporating linguistic variables. Here, A denotes the condition (premise), while B signifies the consequence (rule significance). In order to account for ambiguity and inaccuracy, linguistic variables and fuzzy IF–THEN rules are used. Depending on how unique the challenges are, two different types of rule techniques are used. In a fuzzy inference engine, the fuzzy rule base is incorporated to analyze fuzzy outputs (Ben, Al-Asri, Zaher, Hafidi, Burtschell, 2022; Golafshani et al., 2020b). Subsequently, the Defuzzification process converts these outputs from the fuzzy inference engine into precise numerical values and crisp outputs.

4.15 Evaluating the performance of the GP model:

In assessing the adequacy of a model, it is recommended that the ratio of the dataset size to the total number of input features be at least 3, with a preference for a ratio of 5. Validation of the GEP model involves employing various statistical computations across training, testing, and validation datasets. Evaluation focuses on determining the effectiveness of model training and establishing significant associations between the model and experimental data, while minimizing errors. Key parameters such as RMSE, MAE, and RSE are computed during the testing phase (Ben, Al-Asri, Zaher, Hafidi, Burtschell, 2022; Golafshani et al., 2020b). Additionally, employing statistical techniques allows for external validation of the GEP model. A crucial criterion is that one of the regression lines (k or k') with a slope passing through the origin should closely approximate 1. The same dataset is utilized in the linear regression model of this review, which calculates the SCC represented as fc. It is important to remember that the robustness and generalizability of the resulting model depend on the fitting parameters. The fitting parameters for the GEP algorithm are determined using test runs or experimental results. contingent upon the population size (number of chromosomes), which dictates the duration of program execution. Population size levels are chosen based on the quantity and complexity of the prediction model. In this study, the algorithm used two variables—head size and number of genes—to define model architecture (Babatunde et al., 2022; Mogaraju, 2023; Shariati et al., 2021). The head size, representing the size of the model's "head," determines the complexity of each term within the model. Meanwhile, the number of sub-ETs (elementary trees), basic data structures comprising the model, is determined by the number of genes. Five alternative head sizes—8, 9, 10, 12, and 14—are considered in this review, each associated with either three or four genes (Mogaraju, 2023; Nguyen et al., 2019). The GEP algorithm is used to derive the precise parameters for each model. The flow chart for the GP is shown Fig. 3.

Fig. 3
figure 3

Genetic Algorithm Flow Chart Diagram

4.16 Model evaluation criteria

One commonly used performance indicator is the correlation coefficient (R). However, R is insensitive to the division and multiplication of output values by constants, hence it cannot be used as a primary indicator of how well the model predicts. Consequently, this study also assesses additional metrics including mean absolute error (MAE), relative squared error (RSE), and relative root mean square error (RRMSE) (Aci et al., 2018; Kiani et al., 2016; Nehdi et al., 2001; Othman, 2023; Tiza et al., 2023). To evaluate the model's performance in regard to both the RRMSE and R, Gandomi and Roke proposed a performance index (Golafshani et al., 2020b). Equations 17 provide the mathematical formulas for these error functions.

$${\text{Fc}} = {\text{ f }}\left( {\alpha \, + \beta_{{1}} {\text{Y}}_{{1}} + \, \beta_{{2}} {\text{Y}}_{{2}} + \, \beta_{{3}} {\text{Y}}_{{3}} + \, \beta_{{4}} {\text{Y}}_{{4}} \ldots ... \, + \, \beta_{{\text{n}}} {\text{Y}}_{{\text{n}}} } \right) \, + \, \pounds_{{1}}$$
(1)

where Fc is the compressive strength, β1 to βn denote the regression coefficients, α is the regression constant, £1 is the error term.

$${\text{RMSE}} = \,\sqrt {\frac{{\sum\limits_{{\text{i = 1}}}^{{\text{n}}} {\left( {\text{ei - mi}} \right)^{2} } }}{{\text{n}}}}$$
(2)
$${\text{MAE}} = \,\sqrt {\frac{{\sum\limits_{{\text{i = 1}}}^{{\text{n}}} {\left| {\text{ei - mi}} \right|^{2} } }}{{\text{n}}}}$$
(3)
$${\text{RSE}} = \,\sqrt {\frac{{\sum\limits_{{\text{i = 1}}}^{{\text{n}}} {\left( {\text{mi - ei}} \right)^{2} } }}{{\sum\limits_{{\text{i = 1}}}^{{\text{n}}} {\left( {\overline{{\text{e}}} {\text{ - ei}}} \right)^{2} } }}}$$
(4)
$${\text{RRMSE}} = \,\frac{1}{{\left| {\overline{{\text{e}}} } \right|}}\sqrt {\frac{{\sum\limits_{{\text{i = 1}}}^{{\text{n}}} {\left( {\text{ei - mi}} \right)^{2} } }}{{\text{n}}}}$$
(5)
$${\text{R}}\,{ = }\,\frac{{\sum\limits_{{\text{i = 1}}}^{{\text{n}}} {\left( {{\text{ei}} - \overline{{\text{e}}} {\text{i}}} \right)\left( {{\text{mi}} - \overline{{\text{m}}} {\text{i}}} \right)} }}{{\sqrt {\sum\limits_{{\text{i = 1}}}^{{\text{n}}} {\left( {{\text{ei}} - \overline{{\text{e}}} {\text{i}}} \right)^{2} \sum\limits_{{\text{i = 1}}}^{{\text{n}}} {\left( {{\text{mi}} - \overline{{\text{m}}} {\text{i}}} \right)^{2} } } } }}$$
(6)
$$\rho = \,\frac{{{\text{RRMSE}}}}{{1 + {\text{R}}}}$$
(7)

4.17 Experimental setup of artificial neural network

The experimental setup focuses on evaluating the compressive and flexural strength of mixed design specimens. Samples undergo casting and subsequent testing at intervals of 7, 14, 21, and 28 days. Compressive strength is determined using cube samples, and flexural strength is determined using beam specimens. After 28 days of casting, cube samples are weighed to determine their density (Algaifi et al., 2021; Chou 2013; Imam et al., 2021; Ly et al., 2021). The flow chart for the ANN is shown in Fig. 4.

Fig. 4
figure 4

ANN Architecture

References [72, 73, 74, 75] evaluate the feasibility of MARS and GEP models in predicting the 28-day compressive strength of SCC. In their research, values of compressive strength were estimated using multivariate adaptive regression spline approach [76, 77, 78, 79, 80, 81]. The findings indicated high ability of both the GEP and MARS models in predicting SCC strength Figs. 5, 6 and 7 and Table 1.

Fig. 5
figure 5

Source: Milad, B and Valiollah, A: Civil Engineering Journal, Vol. 4, No. 7, July 2018

GEP Models in Concrete Strength Prediction.

Fig. 6
figure 6

Source: Milad, B and Valiollah, A: Civil Engineering Journal, Vol. 4, No. 7, July 2018

Scatter Plot of Observed and Predicted Compressive Strength during the Training Phase of the GEP Model.

Fig. 7
figure 7

Source: Milad, B and Valiollah, A: Civil Engineering Journal, Vol. 4, No. 7, July 2018

Scatter Plot of Observed and Predicted Compressive Strength during the Testing Phase of the GEP Model.

Table 1 Parameters of optimized GEP models

4.18 Gaps in knowledge

  1. 1.

    For AI algorithms to efficiently identify patterns and generate precise predictions, vast volumes of high-quality data are needed. However, due to variables including variations in material composition, curing conditions, and testing methodologies, it might be difficult to gather enough information about self-compacting concrete (SCC) qualities, particularly compressive strength.

  2. 2.

    Developing precise prediction models requires identifying the most pertinent features or input variables that affect the compressive strength of SCC. To find out which elements have the most effects on strength, researchers may need to investigate different concrete mix designs, admixtures, curing times, and environmental conditions.

  3. 3.

    While AI methods like deep learning can provide strong prediction skills, model interpretability is sometimes sacrificed in the process. Understanding how input features contribute to the anticipated compressive strength and learning about the underlying relationships in the data require striking a balance between interpretability and model complexity.

  4. 4.

    AI models that have been trained on historical data could find it difficult to generalize to novel situations or scenarios that greatly deviate from the training set. To make sure that prediction models can reliably handle differences in material qualities, environmental conditions, and construction processes, researchers need to investigate methods including transfer learning, domain adaptation, and robust optimization.

  5. 5.

    To evaluate AI models' dependability and generalizability, proper validation is essential. To properly assess the precision and resilience of their prediction models, researchers must use the right performance measures (such as mean absolute error, root mean square error) and validation strategies (such as holdout validation, cross-validation).

  6. 6.

    Measurement inaccuracy and other kinds of variability make predicting concrete strength fundamentally questionable. AI approaches that use uncertainty quantification techniques, like ensemble learning or Bayesian neural networks, can produce more accurate forecasts and uncertainty estimations, which are necessary for engineering applications to make well-informed decisions.

5 Conclusions

From the findings of this review, it was observed that the application of artificial intelligence techniques in predicting the compressive strength of self-compacting concrete yields approximate strength values closely aligned with experimental investigations. The research indicates that models achieving R2 > 0.8 demonstrate a significant correlation between predicted values and experimental outcomes in the dataset. The review further noted that across all evaluated techniques, R2 consistently exceeded 0.8 for 28-day compressive strength, affirming the suitability of these models for accurate predictions. Particularly, the ANN model exhibited notable consistency when compared to experimental results of concrete compressive strength, thereby establishing ANN as a robust predictive tool suitable for both in situ and experimental predictions. Consequently, it is recommended for formulating various civil engineering properties requiring predictive capabilities. Artificial intelligence models offer significant time and resource savings by obviating the need for experimental tests, which can occasionally delay construction projects. Reinforcement learning techniques like Deep Deterministic Policy Gradient (DDPG) are more responsive to selections of hyperparameters than others. Through hyperparameter optimization, a set of parameters can be identified that optimizes model performance on test data, minimizing predefined loss functions. Overfitting, a common issue in machine learning where models perform poorly on new data due to over-adaptation to training data, can be mitigated through strategies such as cross-validation, data augmentation, dropout techniques, and careful feature selection.