A review of soft computing techniques in predicting the compressive strength of concrete and the future scope

Dabholkar, Tanvesh; Narayana, Harish; Janardhan, Prashanth

doi:10.1007/s41062-023-01150-5

A review of soft computing techniques in predicting the compressive strength of concrete and the future scope

Review
Published: 24 May 2023

Volume 8, article number 176, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Innovative Infrastructure Solutions Aims and scope Submit manuscript

A review of soft computing techniques in predicting the compressive strength of concrete and the future scope

Download PDF

Tanvesh Dabholkar¹,
Harish Narayana² &
Prashanth Janardhan ORCID: orcid.org/0000-0002-2581-7362³

218 Accesses
5 Citations
Explore all metrics

Abstract

Structural design of Reinforced Cement Concrete (RCC) highly depends on the compressive strength of the concrete used. The compressive strength determination techniques are categorized as destructive, non-destructive, and partially destructive. In non-destructive techniques, the equipment is costly and needs expertise. The compressive strength of concrete is influenced by multiple parameters and materials used in making the concrete. Soft computing techniques like Machine learning (ML) and artificial intelligence (AI) have been proven to find hidden relations between multiple parameters and achieve the desired result. The inclusion of AI/ML has enabled the characterization of the strength with advanced techniques based on the individual constituents or images using digital image correlation. Based on the literature reviewed in this study, ML and AI techniques have shown promising outcomes in predicting the compressive strength of concrete. This study systematically examines the contributions made to date in predicting compressive strength utilizing AI-ML-based strategies. It compares and highlights existing literature based on the type of machine learning techniques used, datasets used, evaluation parameters, and performance of different methods. The study does not encompass high strain rate loading or dynamic type of loading. This paper also aims to find the gap in the research conducted and state the potential scope of estimating compressive strength using soft computing techniques.

Employing the optimization algorithms with machine learning framework to estimate the compressive strength of ultra-high-performance concrete (UHPC)

Article 07 July 2023

A hybrid machine learning model to estimate self-compacting concrete compressive strength

Article 01 August 2022

Soft computing based formulations for prediction of compressive strength of sustainable concrete: a comprehensive review

Article 14 February 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

As documented by various standard codes, a typical RCC structure design must be checked for satisfying strength and durability or serviceability criteria. Strength criteria must satisfy flexure, compression, shear, and torsion requirements, whereas serviceability criteria must satisfy deflection and cracking [1,2,3,4]. Each of these requirements is the function of the compressive strength of the concrete directly or indirectly. This makes it of utmost importance to ensure that the concrete used for construction achieves the desired compressive strength used while designing the structure. For this purpose, testing and estimating the compressive strength of in situ concrete is imperative.

The sampling frequency is defined in several codes, and a general value can be considered at least 1 sample for each 1–5 cubic meters of concreting work. Every batch of concrete that has been prepared is taken for testing [1]. Each concrete sample must be of a minimum of 0.02 cubic meters in volume and be taken from different points from the concreting work [5]; thus, the compressive test must be carried out multiple times throughout the project. The compressive test must be carried out as outlined in several codes [4, 6, 7]. In addition, the compressive strength of concrete can also be determined by non-destructive tests [8]. Although non-destructive tests do not require sampling and are quick to perform, these methods have higher dispersion and deviation from the true compressive strength [9]. Thus, the conventional techniques of compressive testing require expensive equipment, skilled labour, and a dedicated testing facility. These requirements are challenging to follow in a low-cost construction due to the non-availability of funds, critical and short project spans, and the non-availability of a dedicated testing area.

The compressive strength of concrete is not a stochastic property, but it is dependent on various factors like water-cement ratio [10], cement strength class [11], aggregates and admixtures [12], shape and size [9], to name a few factors, apart from these the conditions under which the concrete is cast like the ambient environment temperature [14], site quality control and various other factors. This makes it a function of some pre-defined parameters between which a relationship can be established. Based on the past literature, it has been seen that soft computing techniques are well suited to finding relationships between parameters to get the desired output in multiple fields [15,16,17,18,19]. It is also seen that conventional modelling techniques of concrete fall short for complex concrete mixes. Soft computing techniques have been proven to predict the compressive strength of a simple concrete mix [20] and a complex of steel fibre-reinforced concrete [21]. The conventional modelling techniques were difficult to model the compressive strength of the complex mix model due to a large number of forecasting variables; soft computing techniques proved useful.

This paper highlights the research in this domain using AI and ML, predominantly bifurcating it into four methods. The first is the conglomerate of algorithms based on rules implemented, the second is the classical ML methods, including the regression and classification techniques, the third is the shallow neural networks, and the fourth is deep machine learning. The approaches reviewed in this study range from simple parametric methods to various AI and ML techniques like shallow neural network models, deep neural networks, and also computer vision-based methods in which the image of the surface of concrete [22] or microtomographic images of the concrete [23]. The computer vision-based methods employ convolutional neural network models to identify patterns in the images and pass them through a deep neural network or classical ML techniques to deduce a satisfactory conclusion.

Qualities a successful prediction should have are scalability of the model, i.e., the results should apply to a large-scale concreting as well; secondly, it should be repeatable; and thirdly, it should conform to evaluation metrics. The performance of soft computing techniques is checked by various evaluation metrics like root mean square error (RMSE), mean absolute error, R² value, and mean absolute percentage error (MAPE). The same shall be discussed in brief in the upcoming section.

This paper aims to critically review different soft computing techniques used to predict the compressive strength of concrete. Figure 1 shows the percentage of soft computing techniques utilized by various literature reviewed in this study. The rest of the paper is organized as follows. In section II, the working principles of some of the popular AI algorithms which are encountered in various literature are described. Section III describes the popular evaluation metric used. Section IV describes methodologies used by researchers and their performing capabilities to predict the compressive strength of concrete. Section V describes the conclusion and the future research scope.

Artificial intelligence algorithms

Algorithms based on rules

Adaptive neuro-fuzzy inference system

Fuzzy networks can be optimized using backpropagation and other training functions implemented in a neural network system. This system is called a neuro-fuzzy system. One of the methods for neuro-fuzzy inference is the Mamdani inference system, as shown in Fig. 2. In the Mamdani inference system, a crisp input is applied to a fuzzy set, resulting in a fuzzy output set, which results in a crisp output by de-fuzzification. Mamdani fuzzy inference system has five layers, namely:

i.
Fuzzification layer: This layer is used for calculating the membership functions. There are different types of membership functions like trapezoidal function, bell function, triangular function, Gaussian function, and left–right function, to list a few. For most cases, the Gaussian membership function is used.
ii.
Inference layer: The inference layer is the reasoning mechanism of the neuro-fuzzy system. A fuzzy rule can be defined as a relation between its antecedent (input) and the consequence (output). Based on this, fuzzy reasoning can be defined as a single rule with a single antecedent, a single rule with multiple antecedents, or multiple rules with numerous antecedents. Weights in the inference system are found using the T-norm operator.
iii.
Implication layer: The consequent membership functions are calculated based on the inference layer.
iv.
Aggregation layer: The implied results are summed up to give a fuzzy output.
v.
Defuzzification layer: Defuzzification is the process of converting fuzzy output to a crisp output. In Mamdani Inference System, the centroid of the fuzzy output is calculated, and that value is the crisp output.

Fuzzy logic

A Boolean logic decision-making system takes only two values of true and false or 1 and 0 [24]. These values can be put through several logic gates to get the desired results. However, this logic system does not encompass any condition between true and false, for example, partially true, partially false, or true to some extent, and various other linguistic syntaxes like negations, hedges, and connectives. These scenarios are well captured by fuzzy logic. The decision making by fuzzy logic includes a fuzzification module in which membership values are assigned to specific criteria or conditions. Membership values are like weights that describe to what extent a particular condition is valid. A relationship is formed between membership values using IF-THEN rules, linguistic rules, or conditional rules, which give fuzzy logic an outlook from the perspective of humans. An inference system provides a de-fuzzified output from the relation established to get a crisp value.

Classical ML and clustering

Support vector machine (SVM)

Support vector machine is a classification algorithm used for applications like the classification of images, pattern recognition, and text categorization. The data is classified by constructing a hyperplane of (n-1) dimensions, where ‘n’ is an n-dimensional vector used to describe a data point.

The best hyperplane is the one that ensures maximum margin distance from the data points involved in classification. Figure 3a shows a hyperplane and margin distance in SVM for classification purposes. The data points closest to the hyperplane are called support vectors in this process. The kernel or classifier selection is essential for precise classification using SVM. Different types of kernels in SVM are the polynomial kernel, sigmoid kernel, and radial basis kernel. These three are the non-linear kernels, and the fourth is the linear kernel classifier. SVM has the capability of working with linear as well as non-linear data.

Support vector regression (SVR)

In simple regression problems, the final aim is to minimize the square of the error to ensure the best fit; however, in support vector regression, the main concern is with error reduction only to a certain degree. It is accepted if the predicted value, including the error, falls within the acceptable range. In SVR, the hyperplane is constructed as the best fit line, and the hyperplane is offset by an error tolerance ‘ε’ on both sides. The aim of SVR is that all data points should fall within this tolerance region. Introducing a slack variable minimizes their deviation for values outside the tolerance region. A slack variable ‘ξ’ denotes the deviation of a data point from the margins of the tolerance region. Figure 3b shows a schematic diagram for SVR. SVR and SVM work on similar principles of creating hyperplanes using the kernel; SVM is used for classification, whereas SVR gives a numerical answer.

Multi-linear regression

Multi-linear regression is a multivariate algorithm for supervised machine learning. It aims at establishing the relationship between multiple input variables and a dependent output variable. This algorithm is validated by the least square error between the actual and predicted values. A general form of multi-linear regression is given in equation (1) [25].

$$y={\beta }_{0}+\sum_{i=1}^{n}{\beta }_{i}{x}_{i}+\varepsilon$$

(1)

where ‘y’ is the dependent variable on ‘n’ different independent variables, β₀ is the constant, β_i is the regression vector, and ε is the error.

Non-linear regression

Non-linear regression is a multivariate higher degree algorithm for supervised machine learning. This algorithm establishes a relationship between the dependent variable and a polynomial multi-variable equation. Non-linear regression is used when the relationship between a dependent and independent variable is complex and not linear. A general form of non-linear regression is given in equation (2) [25].

$$y = \beta _{0} + \sum\limits_{\begin{subarray}{l} i = 1 \\ m \in R \end{subarray} }^{n} {\beta _{i} x_{n}^{m} + \varepsilon }$$

(2)

where ‘y’ is the dependent variable on ‘n’ independent variables with a degree ‘m’, m belongs to real number ${\mathbb{R}}$, β₀ is the constant, and ε is the error.

Gaussian process regression

Gaussian process regression is based on the fact that infinite functions could fit the control data points for a particular distribution of data points. A non-parametric regression model calculates the probability distribution for all functions that fit the data points. The Gaussian process uses a kernel-based probabilistic model. The Gaussian process for an input ‘x’ and corresponding output ‘y’ is given by Eq. (3) [25].

$$y=g{\left(x\right)}^{T}\alpha +f\left(x\right)+\varepsilon f(x)\sim N(\mu ,K)$$

(3)

where g(x) is a vector function that relates the input variables to a multi-dimensional space, α is coefficient to g(x), f(x) constitutes for Gaussian process with µ being the mean vector matrix and K being the covariance matrix. The covariance matrix mathematically can be expressed as in equation (4) [25].

$$K=\left[\begin{array}{ccc}K\left({x}_{1}, {x}_{1}\right)& \dots & K\left({x}_{1},{x}_{n}\right)\\ \vdots & \ddots & \vdots \\ K\left({x}_{n},{x}_{1}\right)& \dots & K\left({x}_{n},{x}_{n}\right)\end{array}\right]=\left[\begin{array}{ccc}1& \dots & K\left({x}_{1},{x}_{n}\right)\\ \vdots & \ddots & \vdots \\ K\left({x}_{n},{x}_{1}\right)& \dots & 1\end{array}\right]$$

(4)

Extreme gradient boosting (XGBoost)

Extreme Gradient boosting is a decision tree based supervised learning technique. It can be used for classification and regression purposes. It employs a gradient boosting framework along with distributed processing or parallel processing. Gradient boosting alone suffers from problems like overfitting and more computation time; hence the extreme Gradient boosting method was evolved as a modified gradient boosting method. The training time of XGBoost is lesser than the Gradient boosting method. This method can model the non-linear interactions between various features used in machine learning.

K-nearest neighbour (KNN)

K nearest neighbour assumes that a data point has the same characteristics as that of maximum data points in its vicinity. It is a supervised machine learning technique used for classification and regression. A search space of potential solutions is searched locally within a radius such that ‘k’ nearest neighbours are selected. Figure 3c shows a KNN algorithm for k=6 and k=12, highlighting the significance of the ‘k’ value. The value of ‘k’ is thus an important parameter that needs to be determined first; it should not be too large or too small. Once the value of ‘k’ is decided, the data point is classified based on its proximity to its ‘k’ neighbours. This degree of proximity is quantified by finding the Euclidean distance and then sorting the distance from the most minor to the most significant order. Euclidean distance is given in equation (5) [25].

$$d=\sqrt{{\left({x}_{2}-{x}_{1}\right)}^{2}+{\left({y}_{2}-{y}_{1}\right)}^{2}}$$

(5)

The class label of the first ‘k’ neighbours is considered, and the label that occurs a maximum number of times is assigned as the class of the data point in consideration.

Decision tree algorithm

The decision tree is a supervised learning algorithm that shows existing patterns between predictors and a dependent variable using a combination of predictable rules. The classification technique generates decisions based on the information fed [26]. The schematic representation of rules and their outcomes looks like a tree diagram. The decision tree algorithm is well suited for classification as well as regression tasks. It typically has three types of nodes: root, decision, and leaf node. The root node is the base node and typically the starting point of the algorithms.

Figure 4a shows various decision tree components and bifurcation. The decision node is the node that determines the course path for subsequent outcomes. A leaf node is a terminal node having the outcome of a particular rule. Decision trees are sensitive to the data on which they are trained; hence if the training data is changed, the structure of the decision tree alters. An advantage of the decision tree algorithm is that it can also be trained for noisy data. This algorithm is easier than others but prone to overfitting. More complex tree-based algorithms like random forest and boosting trees are often preferred over decision tree algorithms.

Random forest

A random forest is an ensemble of different decision tree algorithms to form a forest and reduce the disadvantage of individual trees. Random forests are an improvement over bagged decision tree.

The bagged decision tree algorithm creates multiple sub-trees for multiple subsets from the original data. A classification or regression task for a data point is done by taking the average value of all the individual decision trees. Figure 4b shows the classification process of the random forest algorithm.

Bagged decision tree models work better when the correlation between subtrees is less. This is not ensured in the bagged decision tree model; however, random forest ensures this by restricting the learning algorithm to a random feature instead of a learning algorithm searching through all data points in the bagged tree model.

Genetic programming

Genetic programming is a guided random search algorithm based on evolutionary techniques used to search solution space for potential solutions. These algorithms are inspired by nature and are generally used for optimization problems. Genetic programming can solve unconventional problems requiring great computational effort when followed by the traditional approach. A common framework of genetic programming starts by defining the genes which contain chromosomes. Chromosomes are a piece of information contained in a gene. Genetic programming aims to search for a better solution than the previous one. For this purpose, the genes are altered by crossover, mutation, and best gene selection. Genetic programming can also be coupled with other neural network systems to optimize specific parameters and yield better output. Leung et al. [27] tuned the structure of neural networks using an improved genetic algorithm.

Shallow neural network

Artificial neural network (ANN)

The artificial neural network is based on mimicking and learning the data with the process of actual neurons in the vertebrates. In actual neurons, the impulses flow from the dendrites to the cell body, which contains a nucleus. Figure 5a shows a typical artificial neuron. The input information from different features in ANN is similar to dendrites, which are processed in a unit containing an evaluation function and an activation function. The input neurons and output neurons are linked through weights and biases. The simplest form of ANN is a perceptron with two input neurons and one output neuron. Figure 5b shows a simple ANN architecture. Multiple layers of perceptron can be used to create a shallow neural network with a hidden node layer between input and output nodes. A typical epoch consists of updating weights, and the iterations are continued till the final value of weights are achieved such that a weighted sum of information at nodes gives the desired result.

Deep machine learning

Convolutional neural network

The convolutional neural network is a deep neural network with high usage in computer vision for image recognition and pattern detection tasks. The convolutional neural network has three layers: convolutional, pooling, and fully connected. The convolutional and pooling layers can be in multitudes, but the end layer is always fully connected.

The convolutional layer consists of input data, a filter, and a feature map. For an image, the input can be the colour saturation value in each pixel, and the entire image is represented as a matrix of pixel information. The filter is a kernel, typically an array of weights that strides along with pixel information. These weights are multiplied with matrix value and added to form an output matrix. This output matrix is called a feature map or a convolved feature.

The pooling layer reduces dimensions by reducing the number of parameters. Pooling is primarily of two types max pooling and average pooling. In max-pooling, a filter is passed to the input through which the maximum feature value is extracted, and an output layer is formed. The average value of features is taken in average pooling, and the output layer is formed.

In the convolutional and pooling layers, a filter restricts the connectivity of all input features to output features; thus, these layers can be described as partially connected layers. The input nodes get translated through an activation function to an end output layer in the fully connected layer.

Deep belief network

A deep belief network suits an unsupervised and supervised machine learning task. A stack of Restricted Boltzmann Machine forms the deep belief network. Individual RBM performs non-linear operations to create an output that is utilized as input by the next RBM in line. RBMs are utilized for unsupervised learning; every layer of RBM is pre-trained to learn the features in an unsupervised manner. Further fine-tuning is done by the backpropagation technique, which utilizes gradient descent; this is used for classification and regression purposes.

Training algorithms

Backpropagation

Backpropagation is a popular supervised training algorithm technique for the neural network. A single iteration in the forward direction consists of taking a weighted sum and passing through an activation function to find a value at a hidden or output neuron. The backpropagation algorithm updates weights while traversing back from the output to the input in the same iteration. The weight updating equation between the output and the hidden neuron is given by equation (6) [19].

$$\Delta {V}_{jr}=\eta {O}_{j}{\delta }_{r} \;{\delta }_{r}={O}_{r}\left(1-{O}_{r}\right)*\left({t}_{r}-{O}_{r}\right)$$

(6)

where ‘j’ denotes the hidden node, ‘r’ corresponds to the output node, ∆Vjr is the weight change, O_j and O_r are activation values at hidden neurons and output neurons, respectively, tr is the target value at the output neuron, and η is the learning rate. The weight updating equation between the hidden and input neuron is given by equation (7) [19].

$$\Delta {W}_{ij}=\eta {x}_{i}{\delta }_{j}\; {\delta }_{j}={O}_{j}*\left(1-{O}_{j}\right)\sum {V}_{jr}*{\delta }_{r}$$

(7)

where ‘i’ denotes input neurons, x_i is the input value, ∑V_jr is the updated weight between hidden neuron ‘j’ and output neuron ‘r’. These iterations are carried on until the target value error is minimized.

Levenberg–Marquardt (LM)

The Levenberg-Marquardt was developed in the early 1960s for solving non-linear least-square problems. The loss function is the least square error in neural network training problems. For the given data points x_i, y_i, a model curve with parameters xi, βi is to be found such that the sum of the square of the error between the actual data and the model curve is minimum. The minimization is an iterative process that requires an initialization value. The Levenberg-Marquardt algorithm combines the Gauss-Newton method and the gradient descent method. The gradient descent reduces the sum of squared error by updating the parameter along the steepest descent. In the Gauss-Newton method, the squared error sum is reduced by assuming the error function is quadratic at the local point and minimized. When the parameter of the model curve is near the optimum value, then LM acts more like gradient descent, and when the model curve parameter is away from its optimum value, LM works more like the Gauss-Newton method.

Restricted Boltzmann machine (RBM)

The restricted Boltzmann Machine is a training algorithm generally used in deep neural networks. It is derived from Boltzmann Machine. In RBM, the neurons are bifurcated into two groups: visible neurons, where input is given, and hidden neurons. In RBM, the visible and hidden neuron groups are forbidden to have any connection within the group; they only have an inter-group connection. This is in contrast to Boltzmann Machine, where all neurons are connected, making RBM’s easier to implement and more efficient to train. The hidden neuron group is used to capture the probability of features. Random weights are initialized, and then Gibbs sampling is performed, simultaneously updating the weights of all neurons in a layer.

Performance and evaluation metrics

Applying an ML algorithm to solve a specific problem depends on how well it can compute the outcome. To quantify this performance, specific evaluation metrics are defined. This section summarises the top ten evaluation metrics used in all the literature studied for this paper. The performance of the literature survey in the subsequent sections is also highlighted on these ten-evaluation metrics; other scarcely used evaluation metrics are excluded.

A-20 INDEX

The A-20 index is the ratio of the M20 index to the total number of samples in the dataset.

$$A20=\frac{M20}{M}$$

(8)

where M20 is the number of samples whose ratio of experimental value to the predicted value by algorithm does not deviate more than ±20% from 1, thus M20 is the number of samples whose ratio of experimental value to predicted value is between 0.8 and 1.2. ‘M’ is the total number of data samples. For a perfect performance evaluation, the A20 index should be equal to 1.

Mean absolute error (MAE)

The nomenclature for subsections from B to J remains as follows: ${x}_{i}$ = Actual value of the i^th data. ${\widehat{x}}_{i}$ = The predicted value of the i^th data. ${x}_{mean}$ = The average value of the actual data of all the samples. n=The total number of data samples.

The mean absolute error (MAE) is the statistical mean of the absolute difference between the predicted and actual values, i.e., the error. The mean absolute error can be described as given in equation (9) [25].

$$MAE=\frac{1}{n}\sum_{i=1}^{n}\left|{\widehat{x}}_{i}-{x}_{i}\right|$$

(9)

For a perfect correlation, the MAE should be 0.

Mean absolute percentage error (MAPE)

It is the statistical mean of the percentage of error with respect to the actual value of the data [25], given as follows.

$$MAPE=\frac{1}{n}\sum_{i=1}^{n}\left|\frac{{\widehat{x}}_{i}-{x}_{i}}{{x}_{i}}\right|*100$$

(10)

Mean squared error (MSE)

Mean squared error is the mean value of the error squared; this can be considered the second moment of the error about its origin. This helps in understanding the variance and bias of the estimates [25].

$$MAE=\frac{1}{n}\sum_{i=1}^{n}{\left|{\widehat{x}}_{i}-{x}_{i}\right|}^{2}$$

(11)

Regression coefficient (R)

The regression coefficient indicates the correlation between the expected and actual outcomes. The regression coefficient lies between − 1 to +1, where -1 regression indicates an asynchronous correlation, whereas +1 represents a perfect correlation.

Root mean squared error (RMSE)

Root mean squared error is a non-negative value describing the predictive performance of the model. RMSE is useful when large errors are undesirable since the errors are squared before taking their average. RMSE value of 0 indicates the predicted and actual values coincide. It is the square root of the mean square error and is given as follows [25].

$$RMSE= \sqrt{\frac{1}{n}\sum_{i=1}^{n}{\left|{\widehat{x}}_{i}-{x}_{i}\right|}^{2}}$$

(12)

RMSE-standard deviation ratio (RSR)

RSR is the ratio of root mean squared error to the standard deviation of the original data [25].

$$RSR= \frac{RMSE}{\sqrt{\frac{1}{n}{\sum_{i=1}^{n}\left({x}_{i}-{x}_{mean}\right)}^{2}}}$$

(13)

R-squared

The R-squared value determines to what extent the predicted and actual values fit. An R² value of 1 indicates the model can perfectly predict the outcome, whereas an R² value of 0 means the model fails to predict [25].

$${R}^{2}=1-\frac{\sum_{i=1}^{n}{\left({x}_{i}-{\widehat{x}}_{i}\right)}^{2}}{\sum_{i=1}^{n}{\left({x}_{i}-{\widehat{x}}_{mean}\right)}^{2}}$$

(14)

Variance account for (VAF)

Variance Account For is the ratio of the variance of the error to the variance of actual data. It is often expressed as a percentage. The ideal value of VAF should be around 100% [25].

$$VAF\%=\left(1-\frac{var\left({x}_{i}-{\widehat{x}}_{i}\right)}{var\left({x}_{i}\right)}\right)*100$$

(15)

Weighted mean absolute percentage error (WMAPE)

WMAPE is a variant of MAPE in which the error value of i^th data is intensified appropriately according to the weightage of i^th data. Thus, an error for a higher magnitude data has more significant weightage in the total error computation. In this, the weighted mean is taken in which the actual value is multiplied by the error and divided by the summation of actual data [25].

$$WMAPE= \frac{\sum_{i=1}^{n}\left|\frac{{x}_{i}-{\widehat{x}}_{i}}{{x}_{i}}\right|*{x}_{i}}{\sum_{i=1}^{n}{x}_{i}}*100$$

(16)

AI for compressive strength prediction

This section bifurcates the algorithms into four subsections and discusses the individual techniques' methodologies, results and performance evaluation. The number of data points affects the model's performance depending upon the type of machine learning technique employed. Figure 6 shows the number of times different prominent features were used as input parameters in the AI-ML studies.

Algorithms based on rule

In algorithms based on the rule, the inputs are defined as suitable membership functions to which rules are applied and relationships are generated. The output of this relationship is de-fuzzified, and a crisp output is generated. This section discusses various papers utilizing fuzzy logic and a combination of fuzzy logic and neural network.

Iqtidar et al. [28] predicted the compressive strength of concrete blended with rice husk using ANFIS architecture, ANN, MNLR, and LR. A Gaussian membership function was used to fuzzify the total of 192 input data. The input data was collected from Google Scopus, comprised of rice husk ash, age, superplasticizer, water, cement amount, and aggregates; this data was sent to nodes that multiplied the input with certain factors to generate weights for the next layer. The output weights were normalized to get the final weights; these weights were then multiplied to adaptive square nodes, a function of the input parameters. This was then sent to an aggregation layer to produce crisp output. ANFIS performed better than LR and MNLR in this experiment, with an R² value of 0.89.

Tayfur et al. [29] predicted the compressive strength of high strength concrete using fuzzy logic and ANN. A total of 60 sets of data were produced using different binder contents prepared from the various percentages of silica fumes. Input data of silica fumes content, binder content, and age, and the output of compressive strength was fuzzified using the triangular membership function. Twenty-four fuzzy rules were defined, and the minimum operation was performed for inference to find the fuzzy output set.

The centroid method aggregated and de-fuzzified the outputs to get a crisp output. It was seen that fuzzy logic performed adequately better than ANN with a regression coefficient of 0.95. FL also outperformed ANN in MAE and RMSE evaluation metrics.

Abolpour et al. [30] compared the efficiency of fuzzy logic defined with the triangular membership function and Gaussian membership function in predicting the compressive strength of concrete. A data set of 1030 concrete mixtures from the University of California was used as the input for fuzzification. Input variables include weight percent of cement, water, blast furnace slag, fly ash, superplasticizer, fine and coarse aggregate, and concrete age in days, the output function being the concrete compressive strength. 897 fuzzy rules and five linguistic values were defined. It was seen that the triangular membership function had a MAE of 11.72% and performed better than the Gaussian membership function, with a MAE of 13.27%.

Sarıdemir et al. [31] predicted the long-term effect of GGBFS on the compressive strength of the concrete using fuzzy logic and ANN. The compressive strength was obtained from the literature data and was verified using fuzzy logic and ANN. The input data include the specimen’s age, water, aggregate, Portland cement, and GGBS. The input data was fuzzified using the triangular membership function. The product method was used as the inference operator, and the weighted average method was used for de-fuzzification. The FL correlated with the actual compressive strength with an R² value of 0.99 and 0.97 for testing and training.

Özcan et al. [32] compared ANN and fuzzy logic for predicting the compressive strength of silica fume concrete. Forty-eight concrete mixtures were produced using four different water-cement ratios, three different cement dosages, and three partial silica fume replacement ratios. A triangular membership function fuzzified the inputs. Nine rules for compressive strength were formed. Mamdani’s inference system was used to get the crisp output. The experimental compressive strength was compared with the predicted compressive strength. The fuzzy logic showed promising outcomes for predicting the compressive strength of silica fume concrete with an R² value of 0.93.

Topcu and Sarıdemir [33] used ANN and fuzzy logic to predict the compressive strength of concrete blended with different fly ashes having high and low lime percentages for 7, 28, and 90 days. The triangular membership function fuzzified the inputs. One hundred eighty experimental data were used, and Sugeno-type fuzzy rules were established to infer input and output. The product method was used as an inference operator. The crisp output showed a very high correlation with the experimental output, with an R² value of 0.99 for training and testing.

Akkurt et al. [34] developed a fuzzy logic model to predict 28^th-day cement compressive strength and compared it with the ANN method. The data used for the training algorithm was taken from a cement plant in Izmir, Turkey. A triangular membership function fuzzifies fifty sets of input data. The product operator defined the inference system using the Mamdani fuzzy rules. ANN was constructed with 4-4-1 architecture, suggesting four neurons in the input layer, 4 in the hidden layer, and 1 in the output layer. The average error for the fuzzy logic model was 2.69%; this could have been improved by having more than four input parameters. ANN model performed better with an RMSE value of 1.7. Table 1 summarises the rule-based algorithms reviewed in this paper with respect to the number of samples used for training and the input features used in training. Table 2 summarises the rule-based algorithms with respect to the evaluation metrics and their performance.

Table 1 Summary of Rule-based algorithms w.r.t features and data points

Full size table

Table 2 Summary of Rule-based algorithms with respect to the evaluation metrics

Full size table