1 Introduction

Among numerous trends and developments in constructions area over the last decades, the introduction of self-compacting concrete (SCC) is of high interest for the exploitation of alternative raw materials, by-products, wastes and secondary materials as mineral additives. It is commonly characterized as a special concrete with enhanced fluid properties such as increased flowability and good segregation resistance and can settle by its own weight even at the presence of congested reinforcement at deep and narrow element sections of nonconventional geometry. Therefore, SCC can consolidate itself without requiring the use of internal or external vibration during the placing process, thus avoiding segregation and bleeding and at the same time maintaining its stability [4, 23]. Moreover, the potential use of SCC in lightweight applications has drawn significant attention [32].

Due to its complex composition, a proper mix design process is necessary for SCC in order to accomplish its desirable properties. For this design process, the available materials must be taken into account, proportioned with one or more mineral as well as chemical admixtures. The challenge of enhancing grain size distribution and particle packing, thus ensuring greater cohesiveness for SCC, is addressed by seeking optimum balance between coarse and fine materials and the chemical admixtures. According to [18], variations in cement and/or mineral additives due to changes in the production process as well as in the aggregate type may cause large variations on the properties of fresh SCC. Therefore, it is of great importance to have a robust mixture, which is minimally affected by the external sources of variability. Towards this direction, the utilization of powder industrial by-products and wastes as environmental friendly mineral additives for the production of lightweight SCC has attracted the attention of researchers [11, 32, 35, 37]. Furthermore, a wide variety of secondary materials exists for the mix design process, such as limestone powder (LP), fly ash (FA), ground granulated blast furnace slag (GGBFS), silica fume (SF), rice husk ash (RHA) and, as chemical admixtures, new generation superplasticizers (SP) and viscosity-modifying admixtures (VMA) [4, 11, 17, 18, 23, 32, 35, 37, 56, 67].

Artificial neural networks (ANNs) have emerged the last decades as an attractive meta-modelling technique applicable to a vast number of scientific fields including material science, among others. The main characteristic of this method is that a surrogate model can be constructed after a training process with only a few available data, which can be used in order to predict pre-selected model parameters, reducing the need for time- and money-consuming experiments. So far, literature includes publications in which ANNs were used for predicting the compressive strength and modulus of elasticity [22, 39, 63, 64] and for modelling the characteristics of concrete materials [1, 13, 44, 66]. Moreover, similar methods such as fuzzy logic and genetic algorithms have also been used for modelling the compressive strength of concrete materials [3, 12, 46]. Detailed and in-depth state of the art report can be found in [2, 42, 43, 57].

In this context, in the present work, the application of properly trained ANN models for the prediction of the 28-day compressive strength of admixture-based self-compacting concrete is presented. The database consists of 205 specimens (taken from the literature) having mixture composition with comparable physical and chemical properties. The developed ANN models take into consideration 11 SCC composition parameters in order to predict the compressive strength and have been proven to be very successful, exhibiting very reliable predictions.

2 Artificial neural networks

This section summarizes the mathematical and computational aspects of artificial neural networks. In general, ANNs are information-processing models configured for a specific application through a training process. A trained ANN maps a given input into a specific output and thereby can be considered to be similar to a response surface method. This main advantage of a trained ANN over conventional numerical analysis procedures (e.g. regression analysis) is that the results can be produced with much less computational effort [2, 5,6,7,8, 30, 33, 51,51,53].

2.1 General

The concept of an artificial neural network is based on the concept of the biological neural network of the human brain. The basic building block of the ANN is the artificial neuron, which is a mathematical model trying to mimic the behaviour of the biological neuron. Information is passed into the artificial neuron as input, and it is processed with a mathematical function leading to an output which determines the behaviour of the neuron (similar to fire or not situation for the biological neuron). Before the information enters the neuron, it is weighted in order to approximate the random nature of the biological neuron. A group of such neurons consists of an ANN in a manner similar to biological neural networks. In order to set up an ANN, one needs to define (i) the architecture of the ANN, (ii) the training algorithm, which will be used for the ANN learning phase, and (iii) the mathematical functions describing the mathematical model. The architecture or topology of the ANN describes the way the artificial neurons are organized in the group and how information flows within the network. For example, if the neurons are organized in more than one layer, then the network is called a multilayer ANN. Regarding the training phase of the ANN, it can be considered as a function minimization problem in which the optimum value of weights needs to be determined by minimizing an error function. Depending on the optimization algorithms used for this purpose, different types of ANN exist. Finally, the two mathematical functions that define the behaviour of each neuron are the summation function and the activation function. In the present study, we use a back-propagation neural network (BPNN) which is described in the next section.

2.2 Architecture of BPNN

A BPNN is a feed-forward, multilayer network i.e. information flows only from the input towards the output with no back loops and the neurons of the same layer are not connected to each other, but they are connected with all the neurons of the previous and subsequent layer. A BPNN has a standard structure that can be written as

$$ \mathbf{N}-{\mathbf{H}}_1-{\mathbf{H}}_2-\cdots -{\mathbf{H}}_{\mathbf{NHL}}-\mathbf{M} $$
(1)

where N is the number of input neurons (input parameters), Hi is the number of neurons in the ith hidden layer for i = 1 ,  …  , NHL, NHL is the number of hidden layers and M is the number of output neurons (output parameters). Figure 1 depicts an example of a BPNN composed of an input layer with five neurons, two hidden layers with four and three neurons respectively and an output layer with two neurons, i.e. a 5-4-3-2 BPNN.

Fig. 1
figure 1

A 5-4-3-2 BPNN

A notation for a single node (with the corresponding R-element input vector) of a hidden layer is presented in Fig. 2.

Fig. 2
figure 2

A neuron with a single R-element input vector

For each neuron i, the individual element inputs p1 ,   …  , pR are multiplied by the corresponding weights wi , 1 ,   …  , wi , R and the weighted values are fed to the junction of the summation function in which the dot product (W ∙ p) of the weight vector W = [wi , 1,  … , wi , R] and the input vector p = [p1,  …, pR]T is generated. The threshold b (bias) is added to the dot product forming the net input n which is the argument of the transfer function ƒ:

$$ n=\boldsymbol{W}\cdot \boldsymbol{p}={w}_{i,1}{p}_1+{w}_{i,2}{p}_2+\dots +{w}_{i, R}{p}_R+ b $$
(2)

The choice of the transfer (or activation) function f may strongly influence the complexity and performance of the ANN. Although sigmoidal transfer functions are the most commonly used, one may use different types of functions. Previous studies [9, 36] have proposed a large number of alternative transfer functions. In the present study, the logistic sigmoid and the hyperbolic tangent transfer functions were found to be appropriate for the problem investigated. During the training phase, the training data are fed into the network which tries to create a mapping between the input and the output values. This mapping is achieved by adjusting the weights by minimizing the following error function

$$ E=\sum {\left({t}_i-{o}_i\right)}^2 $$
(3)

where ti and oi are the exact value and the prediction of the network, respectively, within an optimization framework. The training algorithm used for the optimization plays a crucial role in building a quality mapping, and an exhaustive investigation was performed in order to find the most suitable for this problem. The most common method used in literature is the back-propagation technique in which, as stated by its name, the information propagates to the network in a backward manner in order to adjust the weights and minimize the error function. To adjust the weights properly, a general method called gradient descent is applied in which the gradients of the error function with respect to the network weights are calculated. Further discussion on the training algorithms is made in the numerical example section.

2.3 Dealing with overfitting

One of the most common problems that occur during the training phase of an ANN is the overfitting. In this stage, the network has learned the available training data very well (very small value for the error function), but when new data are provided to the network, this error increases significantly and the network’s prediction is poor. In order to prevent overfitting, several techniques/algorithms and criteria have been proposed for determining the number of hidden layers as well the number of neurons of each layer. Furthermore, the training of the ANN can be terminated before the network has the opportunity to learn the data very well and a regularization term can be added in the objective function in order to smooth the mapping [7, 8, 14,15,16, 20, 30, 38, 47].

2.4 Proposed algorithm

In the present work, a simple heuristic algorithm is proposed in order to obtain a reliable and robust ANN for predicting the 28-day compressive strength of admixture-based self-compacting concrete. The steps of the proposed algorithm are the following:

  1. Step 1.

    Normalization of data: The normalization is a pre-processing phase which has been proved to be the most crucial step of any type of problem in the field of soft computing techniques such as the artificial neural network techniques.

  2. Step 2.

    Development and training of several ANNs: The development and training of the ANNs occurs with a number of hidden layers ranging from 1 to 2 and with a number of neurons ranging from 4 to 20. Each one of the ANNs is developed and trained for a number (nf) of different activation functions as well as with and without the use of data pre-processing techniques (step 1).

  3. Step 3.

    Determination of the mean square error: For each one of the above trained NNs, the mean square error (MSE) is computed for a set of data (validation data), which have not been used during the training phase (training data) of the ANNs.

  4. Step 4.

    Establishment of upper and lower limits: Upper and lower limits are introduced for each one of the output parameters based on experimental or numerical data as well as reasonable estimations by the users.

  5. Step 5.

    Selection of optimum architecture: The optimum architecture is the one that gives the minimum mean square error while all the computed output parameters for all the validation data are between the upper and lower limits.

It should be emphasized that the importance of the limits established at step 4 is based on the user’s expertise and experience to the specific field for making reasonable assumptions.

3 Results and discussion

In this section, the above proposed algorithm is presented step by step for tuning optimum ANNs used for the prediction of the 28-day compressive strength of admixture-based self-compacting concrete based on availability in the literature experimental data.

3.1 Experimental

The database used herein consists of 205 mixes obtained from literature ([10, 19, 23,24,25,26,27,28,29, 31, 32, 45, 50, 54, 55, 58,58,59,60,62, 65, 67]) (Table 1).

Table 1 Experimental data/results and input and output parameters of BPNNs

Each input training vector p is of dimension 1 × 11 and consists the values of the 11 fillers (R = 11) namely, the cement (C), the coarse aggregate (CA), the fine aggregate (FA), the water (W), the limestone powder (LP), the fly ash (FA), the ground granulated blast furnace slag (GGBFS), the silica fume (SF), the rice husk ash (RHA) and, as chemical admixtures, the new generation superplasticizers (SP) and the viscosity modifying admixtures (VMA). The corresponding output training vectors are of dimension 1 × 1 and consist the value of the 28-day compressive strength of the SCC specimens. Their mean values together with the minimum and maximum values are listed in Table 2.

Table 2 The input and output parameters used in the development of BPNNs

3.2 Sensitivity analysis

In general, sensitivity analysis of a numerical model is a technique used to determine if the output of the model (response, stress, deformation, stresses, etc.) is affected by changes in the assumptions of the inputs (Young’s modulus, fillers, etc.). During the development of an ANN, it is of high importance to know the effect of each one of the above 11 composition parameters (network inputs) on the compressive strength of SCC (network output). This will provide feedback as to which input parameters are the most significant, and thus, by removing the insignificant ones, the input space will be reduced and subsequently the complexity of the ANN as well as the training times required for its training will be also reduced. The results obtained from the sensitivity analysis are presented in Fig. 1. Based on these results, the viscosity-modifying admixtures (VMA) parameter has the strongest impact on the compressive strength while the fine aggregate (FA) parameter has the weakest impact (Fig. 3).

Fig. 3
figure 3

Sensitivity analysis of the compressive strength to the composition parameters of SCC

3.3 Normalization of data

As mentioned previously, the normalization of the input and output parameters has a significant impact on the ANN training. In the present study, during the pre-processing stage, the min–max [21] and the z-score normalization methods have been used. In particular, the 11 input parameters (Table 1) as well as the single output parameter have been normalized using the min–max normalization method. As stated in [34], in order to avoid problems associated with low learning rates of the ANN, the normalization of the data should be made within a range defined by appropriate upper and lower limit values of the corresponding parameter. In this work, the input and output parameters have been normalized in the range [0, 1] and [−1, 1], respectively. Moreover, in this work, a transformation technique called Central has been applied, in which the origin of the training data is shifted to the centre of the data with the following formula:

$$ {z}_i={x}_i-\frac{ \max (x)+ \min (x)}{2} $$
(4)

where x (x1x2,  … , xn) are the original data and zi is the ith transformed data. The results obtained using this technique for pre-processing the data were better compared to the results obtained by other known normalization techniques found in literature, as will be demonstrated in the section of the numerical examples.

3.4 Training algorithms

In order to find the training algorithm that is more suitable to tackle the nonlinear behaviour of the SCC’s compressive strength, the performance of various optimization techniques such as the quasi-Newton, resilient, one-step secant, gradient descent with momentum and adaptive learning rate and the Levenberg-Marquardt method have been investigated. It should be mentioned that all the ANNs under study (they will be presented in the next section) have been investigated by means of all the aforementioned training algorithms. Among these algorithms, the best—by far—ANN prediction of the output parameter was achieved by using the Levenberg-Marquardt algorithm as implemented by levmar. This algorithm appears to be optimum for training moderate-sized (up to several hundred neurons per layer) feedforward neural networks dealing with nonlinear problems (Lourakis 2005).

3.5 BPNN model development

In this work, a total of 91,800 different BPNN models have been developed and investigated. More specifically, a number of 18,360 of these involve ANN architectures implemented in 5 different computers in order to investigate the sensitivity of the ANN results to the very nature of the floating-point arithmetic of each computer. Each one of these ANN models was trained over 113 datasets out of a total of 169 datasets (66.86% of the total number), and the validation and testing of the trained ANN were performed with the remaining 56 datasets. More specifically, 28 datasets (16.57%) were used for the validation of the trained ANN and 28 (16.57%) datasets were used for the testing (estimating the Pearson’s correlation coefficient R). The parameters used for the ANN training are summarized in Table 3. In order to have a fair comparison of the various ANNs, the datasets used for their training are manually divided by the user into training, validation and testing sets using appropriate indices to state whether the data belongs to the training, validation or testing set. In the general case, the division of the datasets into the three groups is made randomly.

Table 3 Training parameters of BBNN models

.

The 91,800 developed ANN models were sorted in a decreasing order based on Pearson’s correlation coefficient value, and the architecture of the top 20 models are presented in Table 4 for the five computers used. Also, Tables 5, 6, 7, 8, and 9 present the top 20 models for each computer individually. Based on these results, the optimum BPNN model is that of 11-11-5-1 (Fig. 4) with Pearson’s correlation coefficient R equal to 0.9828.

Table 4 Ranking of the top twenty best architectures of BPNNs based on Pearson’s correlation coefficient R (all computers)
Table 5 Ranking of the top twenty best architectures of BPNNs based on Pearson’s correlation coefficient R (computer C01)
Table 6 Ranking of the top twenty best architectures of BPNNs based on Pearson’s correlation coefficient R (computer C02)
Table 7 Ranking of the top twenty best architectures of BPNNs based on Pearson’s correlation coefficient R (computer C03)
Table 8 Ranking of the top twenty best architectures of BPNNs based on Pearson’s correlation coefficient R (computer C04)
Table 9 Ranking of the top twenty best architectures of BPNNs based on Pearson’s correlation coefficient R (computer C05)
Fig. 4
figure 4

The best with two hidden layers (11-11-5-1) of BPNN based on Pearson’s correlation coefficient R

Figures 5 and 6 depict the comparison of the exact experimental values with the predicted values of the optimum BPNN model with topology 11-11-5-1. These results clearly show that the 28-day compressive strength of admixture-based self-compacting concrete predicted from the multilayer feed-forward neural network are very close to the experimental results.

Fig. 5
figure 5

Pearson’s correlation coefficient R of the experimental and predicted compressive strength for the best with two hidden layers BPNN (11-11-5-1)

Fig. 6
figure 6

Experimental vs predicted values of compressive strength for the best with two hidden layers BPNN (11-115-1)

From the presented results, we see that the following:

  • Among the available literature training algorithms, the best, by far, ANN prediction of the SCC strength was achieved by using the Levenberg-Marquardt algorithm.

  • The computational environment significantly affects the performance of the ANN training and subsequently its performance. This is due to the fact that the algorithms of the computational units ultimately rely on basic arithmetic operations that can yield different results when performed in different environments due to the very nature of floating-point arithmetic. Different optimum ANN architectures were found in different computers.

  • Furthermore, the proposed new formula for the normalization of data proved effective and robust compared to available ones.

  • For the top 20 models, the optimum number of hidden layers was found to be two.

  • The initial values of weights significantly affect the results; different values of the initial weights result in different optimum ANN architectures.

  • All 20 ANN models presented in Table 4 have been trained with a number of epochs between 47 and 73. This means that the developed multilayer feed-forward neural network models can predict the 28-day compressive strength of admixture-based self-compacting concrete with smaller error rates and less computational effort compared to the one presented in the literature. Furthermore, these ANN models predict the SCC compressive strength with values of the Pearson’s correlation coefficient R between 0.98083 and 0.98275 (for the optimum one, see Table 4) while the best values in the literature is 0.97 for ANN and 0.98 for the case of using fuzzy logic models [41].

4 Conclusions

In this paper, artificial neural networks were trained in order to investigate their capability in predicting the 28-day compressive strength of admixture-based self-compacting concrete. In order to achieve that, a novel heuristic algorithm was proposed in order to find the optimum architectures for a set of multilayered feed-forward back-propagation neural networks based on the value of Pearson’s correlation coefficient. The results showed that the prediction of the compressive strengths of admixture-based self-compacting concrete value obtained with the trained ANNs is very close to the experimental results making ANN a very promising metamodel for predicting the 28-day compressive strength of SCC mixtures. Furthermore, the proposed new formula for the normalization of data proved effective and robust compared to available ones in the literature.

In a subsequent work, the reverse problem will be investigated in which the identification of the optimum ANN configuration will be the target when the compressive SCC strength will be the input and the 11 compositional parameters will be the output.