1 Introduction

Due to its outstanding properties, such as high density and durability, low impermeability and low shrinkage, high-strength concrete (HSC) has been extensively used for concrete structures, buildings and bridges in recent years. Typically, HSC refers to concrete with a 28-day cylinder compressive strength greater than 40 MPa. Obviously, HSC can resist larger loads than normal-strength concrete.

To better use HSC, we need more mechanical, chemical and physical knowledge of the constituents of HSC than that of normal strength concrete. Many efforts have recently been made to investigate the mechanism, properties and performance of HSC [1]. For instance, Min et al. [2] experimentally studied the compressive strength of early-age concrete and concluded that the compressive strength of concrete increases with increasing content of hardening accelerators. Sharmila and Dhinakaran [3] studied the durability of HSC by substituting readymade ultrafine slag for cement. Samimi et al. [4] experimentally investigated the compressive strength of high-strength self-compacting concrete. Dvorkin et al. [5] experimentally proposed a mix design approach for HSC materials and predicted their compressive strength. Baduge et al. [6] studied the failure mechanism and constitutive relations of HSC.

Compressive strength is one of the most important parameters of HSC and is of vital importance for construction and building [7]; therefore, it is important for engineers to accurately estimate the compressive strength of HSC. Based on linear or nonlinear regression equations, many empirical formulas have recently been proposed by researchers; however, these empirical formulas are less accurate because there are many factors that can affect the compressive strength of HSC, and it is difficult to incorporate these factors into a formula. In this context, accurate estimation of the compressive strength of HSC calls for new and innovative machine learning techniques, such as artificial neural networks (ANNs) [8].

In recent years, many researchers have investigated the potential utilization of ANN techniques to estimate the compressive strength of concrete. For instance, Al-Shamiri et al. [9] predicted the compressive strength of HSC using extreme learning machine (ELM) and back propagation (BP) methods. They concluded that both the ELM and BP methods are reliable for predicting the compressive strength of HSC. Sarıdemir [10] investigated the compressive strength of concretes containing metakaolin (MK) and silica fume (SF) using ANN models. The results showed that the multiplayer feed forward neural network model has great potential to accurately predict the compressive strength of concretes containing MK and SF. Chou and Pham [11] developed an ensemble artificial intelligence approach to predict the compressive strength of high-performance concrete (HPC). They confirmed that the ensemble technique has superior prediction accuracy to individual models. Alshihri et al. [12] investigated the compressive strength of structural lightweight concrete (LWC) using neural network models. It is concluded that neural network models can be used to predict the compressive strength of LWC mixtures. Chithra et al. [13] conducted a comparative study on the compressive strength prediction models for HPC containing nanosilica and copper slag. They concluded that the ANN model is more accurate than multiple regression analysis for compressive strength prediction of HPC. Duan et al. [14] predicted the compressive strength of recycled aggregate concrete (RAC). The results confirmed that the ANN model has good potential in predicting the compressive strength of RAC. Dantas et al. [15] investigated the compressive strength of concrete containing construction and demolition waste (CDW) using ANN models. They concluded that the ANN model has great potential for the prediction of the compressive strength of concrete containing CDW.

Based on previous studies, it has been confirmed that the prediction performance of ANN techniques is better than that of existing empirical models; however, the ANN model may face some issues, such as a slow convergence rate and reaching a local minimum. Combining neural networks (NNs) and fuzzy logic systems, the adaptive neuro fuzzy inference system (ANFIS) has recently become an attractive modeling technique and has been extensively applied in many fields [16]. However, similar to that of ANNs, the training process of an ANFIS may also have some limitations, including slow convergence and overfitting problems.

The primary purpose of this study is to develop a hybrid model by integrating the firefly algorithm (FFA) and fuzzy c-means (FCM) clustering method into an ANFIS to estimate the compressive strength of HSC. The FFA and FCM techniques were utilized to improve the forecasting accuracy of the proposed ANFIS. A large amount of experimental data was used to build the hybrid ANFIS-FFA model. The five main constituents of HSC and the compressive strength of HSC were considered the input and output variables, respectively. Furthermore, a comparison was conducted among four artificial intelligence models, including the proposed ANFIS-FFA model, the traditional ANFIS, the back propagation neural network (BPNN) and the extreme learning machine (ELM), in terms of four statistical indices. In addition, a detailed parametric analysis was carried out to investigate the effect of each input variable on the compressive strength of HSC.

2 Data collection

HSC data from 324 samples in Ref. [9] were used in this study. According to Ref. [9], HSC mix designs were made by utilizing Type 1 Portland cement, fine aggregate, coarse aggregate, water, and a polycarboxylate superplasticizer admixture. The maximum aggregate size of the coarse aggregate is 20 mm. To compare with the experimental and predicted results in Ref. [9], 244 out of the 324 data points were randomly selected for model training, while the remaining 80 data points were utilized for testing. Five main constituents of HSC, cement (kg/m3), water (kg/m3), fine and coarse aggregates (kg/m3), and superplasticizer (kg/m3), are considered the input variables; meanwhile, the compressive strength of HSC (MPa) is used as the output variable. The statistical results of the collected data are summarized in Table 1. The histogram frequencies of these five input parameters are shown in Fig. 1. Figure 1 illustrates that specific ranges of values exist for these five parameters, and predictions of the compressive strength of HSC are meaningful only in this context.

Table 1 Statistical results of data collected (data from Ref. [9])
Fig. 1
figure 1

Histograms of the input parameters

3 Methodology

3.1 ANFIS

The ANFIS combines the learning capabilities of both the neural networks and fuzzy logic system. The structure of the ANFIS (see Fig. 2) is mainly composed of premise parts and consequence parts [17, 18], and it can be described as follows [19,20,21,22]:

Fig. 2
figure 2

Structure of the ANFIS

Layer 1: Each node generates membership grades to a fuzzy set by utilizing the membership functions (MFs):

$$O_{1}^{i} = \mu_{{A_{j}^{k} }} \left( {x_{j} } \right)\quad j = 1,2, \cdots ,m;\quad k = 1,2, \cdots ,n;\quad i = n \times m$$
(1)

where \(x_{j}\) is the input to node j and \(A_{j}^{k}\) is the linguistic label characterized by MF \(\mu_{{A_{j}^{k} }}\). In this study, the Gaussian function is employed as the MF:

$$\mu_{{A_{j}^{k} }} \left( {x_{j} } \right) = \exp \left( { - \frac{{\left( {x - c_{j} } \right)^{2} }}{{2\sigma_{j}^{2} }}} \right)$$
(2)

where \(\sigma_{j}\) and \(c_{j}\) are the premise parameters of the MFs.

Layer 2: Each node in this rule layer is labeled \(\prod\). Output \(O_{2}^{i}\) of the second layer can be described as follows:

$$O_{2}^{i} = w_{i} = \prod \mu_{{A_{j}^{i} }} \left( {x_{j} } \right)\quad j = 1,2, \cdots m;\;i = 1,2, \cdots ,n$$
(3)

where \(w_{i}\) denotes the firing strength of a rule.

Layer 3: The normalized firing strength in this layer can be calculated as follows:

$$O_{3}^{i} = \overline{w}_{i} = \frac{{w_{i} }}{{\sum\nolimits_{i = 1}^{n} {w_{i} } }}$$
(4)

where \(\overline{w}_{i}\) denotes the firing strength of normalization.

Layer 4: The node function of this layer can be calculated as follows:

$$O_{4}^{i} = \overline{w}_{i} f_{i} = \overline{w}_{i} \left( {p_{0}^{i} + p_{1}^{i} x_{1} + p_{2}^{i} x_{2} + \cdots + p_{m}^{i} x_{m} } \right)\;\quad i = 1,2, \cdots ,n$$
(5)

where \(\left\{ {p_{0}^{i} ,p_{1}^{i} , \cdots ,p_{m}^{i} } \right\}\) are often referred to as consequent parameters.

Layer 5: The overall output of this layer can be calculated as follows:

$$O_{5}^{i} = \sum\limits_{i = 1}^{n} {\overline{w}_{i} f_{i} } = \frac{{\sum\nolimits_{i = 1}^{n} {w_{i} f_{i} } }}{{\sum\nolimits_{i = 1}^{n} {w_{i} } }}$$
(6)

Due to its good capability of learning and excellent performance, the ANFIS is adopted in the present study. The predictive abilities of the ANFIS rely on the optimally selected Gaussian MF parameters in Eq. (2) (e.g., \(\sigma_{j}\) and \(c_{j}\)). Therefore, the FFA and FCM techniques were utilized to optimize the premise and conclusion parameters of the ANFIS in this study.

3.2 FCM

FCM is currently one of the most important data clustering methods, and the objective function in the FCM algorithm can be written as follows [23]:

$$J = \sum\limits_{i = 1}^{c} {\sum\limits_{j = 1}^{n} {\omega_{ij}^{m} } } \left\| {x_{j} - v_{i} } \right\|^{2}$$
(7)

where c represents the total number of clusters; \(\omega_{ij}^{{}} \in \left[ {0,1} \right]\) is the degree of membership; \(\left\| {x_{j} - v_{i} } \right\|\) denotes the Euclidean metric from the \(j{{{th}}}\) data point \(x_{j}\) to the \(i{{{th}}}\) cluster center \(v_{i}\); and \(m \in \left( {1,\infty } \right)\) is a constant.

In the FCM approach, the cluster center \(v_{i}\) and the degree of membership \(\omega_{ij}^{{}}\) can be calculated by the following equations:

$$v_{i} = \frac{{\sum\nolimits_{j = 1}^{n} {\omega_{ij} x_{j} } }}{{\sum\nolimits_{j = 1}^{n} {\omega_{ij} } }}\quad i = 1,2, \cdots ,c$$
(8)
$$\omega_{ij}^{{}} = \frac{1}{{\sum\nolimits_{k = 1}^{c} {\left( {\frac{{\left\| {x_{j} - v_{i} } \right\|}}{{\left\| {x_{j} - v_{k} } \right\|}}} \right)^{{\frac{2}{{\left( {m - 1} \right)}}}} } }}$$
(9)

3.3 FFA

The FFA is a new firefly-inspired algorithm and was first proposed by Yang in 2010 [24]. In the FFA, the light intensity \(I\left( r \right)\) and the attractiveness of a firefly \(\beta\) are the two important parameters, and they can be defined as follows [24]:

$$I\left( r \right) = I_{0} e^{ - \gamma r}$$
(10)
$$\beta = \beta_{0} e^{{ - \gamma r^{2} }}$$
(11)

where \(I_{0}\) is the original light intensity; \(\gamma\) is the light absorption coefficient; and \(\beta_{0}\) is the attractiveness at \(r = 0\). \(r\) is the distance between the ith firefly and the jth firefly and can be determined by

$$r_{ij} = \left\| {x_{i} + x_{j} } \right\| = \sqrt {\sum\limits_{k = 1}^{n} {\left( {x_{i,k} - x_{j,k} } \right)} }$$
(12)

The position update of firefly i attracted by another brighter firefly j can be determined by

$$x_{i} = x_{i} + \beta_{0} e^{{ - \gamma r_{ij}^{2} }} \left( {x_{j} - x_{i} } \right) + \alpha \varepsilon_{i}$$
(13)

where \(\alpha \varepsilon_{i}\) represents the randomized term and \(\beta_{0} e^{{ - \gamma r_{ij}^{2} }} \left( {x_{j} - x_{i} } \right)\) is the attraction term. A flowchart of the FFA is shown in Fig. 3.

Fig. 3
figure 3

Flowchart of the FFA

3.4 Hybrid model based on ANFIS-FFA

This section introduces the developed ANFIS-FFA model in detail. In the proposed ANFIS-FFA model, the ANFIS parameters will be optimally obtained by the FFA. Similar to the traditional ANFIS model, the proposed ANFIS-FFA model has five layers. The nodes of the first layer represent the five input variables (e.g., water, cement, fine aggregate, coarse aggregate, and superplasticizer). The nodes of the second and third layers represent the MFs of input variables and fuzzy logic rules, respectively. The fourth layer is the defuzzification layer, and the nodes in this layer use the consequent part of the Sugeno fuzzy model. The fifth layer’s output is the compressive strength of HSC. The FFA is mainly used to determine the best weights between layers 4 and 5, as well as to train the MFs according to the input variables. The proposed ANFIS-FFA model begins by dividing the whole dataset into two parts: training datasets and testing datasets. In the next step, the FCM clustering approach is used to generate clusters, and the ANFIS model is subsequently constructed based on the FCM results. Then, the premise and conclusion parameters of the ANFIS are initialized randomly, and consequently, these parameters are updated in each iteration by utilizing the FFA. This step will be repeated until the stop criteria (e.g., the maximum number of iterations) are satisfied. The optimal solution is then transferred to the ANFIS model. Figure 4 illustrates the flowchart of the developed ANFIS-FFA model.

Fig. 4
figure 4

Flowchart of the ANFIS-FFA

3.5 Performance evaluation criteria

To evaluate the performances of the proposed ANFIS-FFA method, the traditional ANFIS, the back propagation neural network (BPNN) and the extreme learning machine (ELM), four statistical benchmark indices, including the root mean squared error (RMSE), the correlation coefficient (R), the mean absolute error (MAE), and the mean absolute percentage error (MAPE), were adopted in this study and were expressed as follows [25, 26]:

$${\text{RMSE = }}\sqrt {\frac{{\sum\nolimits_{i = 1}^{n} {\left( {y_{i} - \hat{y}_{i} } \right)^{2} } }}{n}}$$
(14)
$${\text{R}} = \frac{{\sum\nolimits_{i = 1}^{n} {\left( {y_{i} - \overline{y}_{i} } \right)\left( {\hat{y}_{i} - \overline{\hat{y}}_{i} } \right)} }}{{\sqrt {\sum\nolimits_{i = 1}^{n} {\left( {y_{i} - \overline{y}_{i} } \right)^{2} \sum\nolimits_{i = 1}^{n} {\left( {\hat{y}_{i} - \overline{\hat{y}}_{i} } \right)^{2} } } } }}$$
(15)
$${\text{MAE}} = \frac{{\sum\nolimits_{i = 1}^{n} {\left| {y_{i} - \hat{y}_{i} } \right|} }}{n}$$
(16)
$${\text{MAPE}} = \frac{{\sum\nolimits_{i = 1}^{n} {\left| {\frac{{y_{i} - \hat{y}_{i} }}{{y_{i} }}} \right|} }}{n}$$
(17)

where \(y_{i}\) and \(\hat{y}_{i}\) are the actual and predicted compressive strengths of HSC, respectively. \(\overline{y}_{i}\) and \(\overline{\hat{y}}_{i}\) are the average results of the actual and predicted compressive strengths of HSC, respectively. n is the whole number of data samples.

4 Results and discussion

4.1 Evaluation of the proposed models

Note that the selection of FFA parameters has an important influence on the convergence rate and forecasting accuracy. For this purpose, the optimal parameters of the FFA were determined by the trial and error approach, and they are given as follows: \(\gamma = 1.0\), \(\beta_{0} = 2.0\), and \(\alpha = 0.2\). Table 2 lists different parameter types and their values used to train the ANFIS.

Table 2 Different parameter types and their values used to train the ANFIS

Figure 5 plots the MFs of the five main constituents of HSC. Figure 6 compares the results of the actual and predicted compressive strengths of HSC for the training data samples and testing data samples. Figure 7 plots the fitting relationship between the actual and predicted compressive strengths of HSC for all the data samples. Figure 7 shows that the correlation coefficient R = 0.9999. Obviously, the predicted compressive strength agrees well with the actual compressive strength of HSC.

Fig. 5
figure 5

MFs of the five main constituents of HSC: a water; b cement; c fine aggregate; d coarse aggregate; and e superplasticizer

Fig. 6
figure 6

Comparison between the actual and predicted compressive strengths of HSC. a Training data samples and b testing data samples

Fig. 7
figure 7

Relationship between the actual and predicted compressive strengths of HSC for all the data samples

Figure 8 plots the three-dimensional graphs of the predicted compressive strength vs. the five main constituents of HSC. We can choose the region from Fig. 8 if we want to obtain the desired compressive strength values of HSC.

Fig. 8
figure 8

Three-dimensional graphs of the predicted compressive strength vs. the five main constituents of HSC

4.2 Comparison of the applied models

Figure 9 and Table 3 show the forecasting performances of the proposed ANFIS-FFA method, the traditional ANFIS model, the back propagation neural network (BPNN) model [9] and the extreme learning machine (ELM) model [9] in terms of four statistical indices. According to Fig. 9 and Table 3, for the training data samples, the R, MAE, RMSE and MAPE values of the developed ANFIS-FFA method are 1, 0 MPa, 0 MPa and 0, respectively, while the R, MAE, RMSE and MAPE values are 1, 0 MPa, 0 MPa and 0 for the traditional ANFIS method, 0.9952, 0.7292 MPa, 0.9197 MPa and 1.467% for the BP model, and 0.9976, 0.5014 MPa, 0.6538 MPa and 1.0198% for the ELM model, respectively; for the testing data samples, the R, MAE, RMSE and MAPE values of the developed ANFIS-FFA method are 0.9965, 0.086 MPa, 0.1673 MPa and 0.2%, respectively, while the R, MAE, RMSE and MAPE values are 0.8788, 0.7315 MPa, 1.0148 MPa and 1.78% for the traditional ANFIS method, 0.9938. 0.7888 MPa, 1.0507 MPa and 1.543% for the BP model, and 0.9937, 0.9205 MPa, 1.1344 MPa and 1.8178% for the ELM model, respectively. For all the data samples, the R, MAE, RMSE and MAPE values of the developed ANFIS-FFA method are 0.9999, 0.0212 MPa, 0.0831 MPa and 0.05%, respectively, while the R, MAE, RMSE and MAPE values are 0.9987, 0.1806 MPa, 0.5042 MPa and 0.44% for the traditional ANFIS method, 0.9949, 0.7372 MPa, 0.9498 MPa and 1.4704% for the BP model and 0.9965, 0.6049 MPa, 0.7998 MPa and 1.2169% for the ELM model, respectively. The results show that both the ANFIS and ANFIS-FFA models are able to predict the compressive strength of HSC quite accurately; however, the ANFIS-FFA model exhibits greater accuracy than the nonoptimized ANFIS model, with higher values of R and lower values of MAE, RMSE and MAPE. The reason behind this may lie in the robustness of the FFA, which is linked to the ANFIS model and contributed to the optimization of the membership function parameters. Similar results have also been reported in previous studies. For example, Yaseen et al. [27, 28] conducted streamflow forecasting using a hybrid ANFIS-FFA model and concluded that the ANFIS-FFA model was not only superior to the ANFIS model but also behaved like a parsimonious model for streamflow prediction; thus, the rationality of the simulation results obtained by this paper is proven.

Fig. 9
figure 9

Performance comparison among different models

Table 3 Performance comparisons of the traditional ANFIS, proposed ANFIS-FFA, BP and ELM models

In addition, Table 3 also shows that the performance of the traditional ANFIS model surpasses that of the BP and ELM models for all data samples, with higher values of R and lower values of MAE, RMSE and MAPE. Overall, the results showed that the forecasting performance rank was ANFIS-FFA > ANFIS > ELM > BP model. We ascertain that the proposed ANFIS-FFA method is a prudent modeling approach that could be adopted for the prediction of the compressive strength of HSC.

4.3 Parametric analysis

To investigate the influence of each input variable (e.g., cement, water, fine aggregate, coarse aggregate and superplasticizer) on the compressive strength of HSC, a parametric study was conducted based on the proposed ANFIS-FFA model. In each experiment, one of the five input parameters was excluded, and only four of them were employed in the proposed ANFIS-FFA model. The parametric experimental results are listed in Table 4.

Table 4 Parametric analysis of the parameters governing the compressive strength of HSC in the proposed ANFIS-FFA model

As seen in Table 4, the ANFIS-FFA model with five input variables has the highest R (0.9987) and the lowest RMSE (0.5042 MPa), MAE (0.012 MPa) and MAPE (0.05%). For the other five ANFIS-FFA models, the ANFIS-FFA model without water has the lowest R (0.8354) and highest RMSE (3.8875 MPa), MAE (4.225 MPa) and MAPE (8.448%). In contrast, the ANFIS-FFA model without superplasticizer has the highest R (0.9822) and lowest RMSE (1.9103 MPa), MAE (1.583 MPa) and MAPE (3.379%). The results show that the effect of the five input parameters on the compressive strength of HSC can be ranked in the order water > fine aggregate > cement > coarse aggregate > superplasticizer.

4.4 Advantages, limitations and future improvement

This study, which utilized the proposed ANFIS-FFA model instead of the traditional ANFIS model, was highly successful in improving the prediction accuracy of the compressive strength of HSC. Although this has verified the better utility of an ANFIS-FFA model over a traditional ANFIS model, the case study reported only the data from Ref. [9], and predictions are meaningful only in this context. For its practical application, the veracity of the proposed ANFIS-FFA model should be evaluated by using more experimental data.

In this study, the fuzzy concept with MFs [0, 1] was utilized in the ANFIS model; however, in a follow-up study, the proposed ANFIS-FFA model could be improved by utilizing an interval-valued fuzzy robust programming approach [29]. The interval parameters in an interval-valued fuzzy system could be directly optimized by the FFA [27]. In addition, Bayesian model averaging or ensemble Kalman filter techniques could also be used to improve the proposed ANFIS-FFA model [30, 31].

To define how each point in the input space is mapped to a membership value (e.g., degree of membership) between 0 and 1, Gaussian membership was used to build the ANFIS model in this study. Although our results showed good performance, further studies should investigate several other MFs, such as triangular, trapezoidal, sigmoidal, \(\pi\)-shaped and generalized bell-shaped MFs, with different partitioning of training and testing datasets. Note that each MF in an ANFIS model has its own merits and drawbacks, and the optimality of any prescribed MF is expected to rely on the particular modeling problem at hand; thus, independent research that compares the various MFs is needed. Although this is a useful task, it was beyond the scope of the present study and thus could be an interesting subject in future work.

5 Conclusions

In this study, a hybrid ANFIS-FFA model was proposed to predict the compressive strength of HSC. To construct the proposed ANFIS-FFA model, five main constituents of HSC were considered as the input variables. These parameters are (1) cement, (2) water, (3) fine aggregate, (4) coarse aggregate, and (5) superplasticizer. The proposed ANFIS-FFA model was assessed using three prediction models: (1) the traditional ANFIS model, (2) the BPNN model, and (3) the ELM model. In addition, detailed parametric analysis was carried out to investigate the effect of each input variable on the compressive strength of HSC. The following conclusions can be drawn from this study:

  1. 1.

    The proposed ANFIS-FFA model provides accurate prediction of the compressive strength of HSC, producing results that are more accurately fitted to the measured results than the traditional ANFIS model and two available models in the literature. The proposed ANFIS-FFA model has the lowest RMSE, MAE, and MAPE and the highest R values compared with the other three models.

  2. 2.

    The R of the proposed ANFIS-FFA model is 1 for the training datasets and 0.9965 for the testing datasets. The results showed that the forecasting performance rank was ANFIS-FFA > ANFIS > ELM > BP model.

  3. 3.

    The parametric analysis results show that the effect of five input parameters on the compressive strength of HSC can be ranked in the order water > fine aggregate > cement > coarse aggregate > superplasticizer.

  4. 4.

    Although the proposed ANFIS-FFA model relative to the traditional ANFIS model was highly successful in improving the prediction accuracy of the compressive strength of HSC, further research should be performed in future work.