Introduction

Although water treatment plants (WTPs) are mostly operated by experts and experienced operators, developing intelligent data-driven models is an essential requirement for enhancing the operation and control quality of the treatment processes. Undoubtedly, the use of intelligent data-driven models for predicting the treatment units’ responses to various technical, physical, chemical, and biological features would enhance the performance of the WTPs. Nowadays, intelligent data-driven models are well-known techniques for predicting the dynamic state of the environmental systems (May et al. 2008; Wu et al. 2014; Hawari et al. 2017; Saadatpour et al. 2020). The data-driven models could be instrumental in operating WTPs when applying physically based numerical models, and/or human resources may cause some restrictions. However, the development of the data-driven models requires a proper and deep understanding of the theoretical foundations of the processes and their physical, chemical, biological, and technical concepts that generate the observed system dynamics in a WTP (Li et al. 2014; Wu et al. 2014; Li et al. 2015a, b; Saadatpour et al. 2020).

Various techniques have been suggested within the literature to develop data-driven models, based on their application to a range of environmental systems (Pascual-Pañach et al. 2021). Artificial neural network (ANN) is an advanced data-driven method inspired by the nervous system in biological organisms and extensively considered in various environmental disciplines (Onukwuli et al. 2021). Typically, ANN structure consists of an input layer, one or more hidden layers, and an output layer. These layers are interconnected to processing units, named neurons, through weights of the neural network (Samadi et al. 2021a). ANN can approximate all types of nonlinear relationships between inputs and outputs if the data-preprocessing techniques are used (Rajendra et al. 2009). Najah et al. (2009) attempted to propose a model for predicting total dissolved solids (TDS), electrical conductivity (EC), and turbidity at the Johor river basin using three-layered feed-forward backpropagation ANN. According to their results, the water quality parameters were simulated correctly and predicted with a mean absolute error of about 10%. Nasr et al. (2012) applied an ANN technique to simulate the performance of the El-Agamy wastewater treatment plant in Egypt. As they reported, chemical oxygen demand (COD), biochemical oxygen demand (BOD), and total suspended solids (TSS) were estimated with a high correlation coefficient (R2 > 0.9) with the ANN model. In another study, Giwa et al. (2016) investigated the effect of mixed liquor suspended solid (MLSS), dissolved oxygen (DO), EC, and pH on the removal efficiency of COD, phosphate (\({\text{PO}}_{4}^{-3}\text{-P}\)), and ammonium \({\text{(NH}}_{4}^{+}\text{-N)}\) from wastewater in an integrated electrically enhanced membrane bioreactor. The ANN model based on the Levenberg–Marquardt backpropagation algorithm predicted the effluent concentration of the contaminants with a high correlation coefficient (R2 = 0.99) (Giwa et al. 2016).

Despite the fact that ANN models are capable of predicting water quality parameters, in some cases where the input parameters are unclear, these techniques encounter problems in determining nonlinear relationships. Some studies revealed that an adaptive neuro-fuzzy inference system (ANFIS) may be a better alternative for such problems (Fu et al. 2020). Indeed, it has the advantages of both ANN and fuzzy inference system (FIS) to handle the uncertainty and noisy data (Zaghloul et al. 2020). Kim and Parnichkun (2017) proposed a hybrid of k-means-ANFIS to predict the settled water turbidity and determine the optimum coagulant dosage using full-scale historical data. Based on the results, sub-models constructed by the k-means-ANFIS were superior to single ANFIS and ANN. In another research conducted by Hawari et al. (2017), fuzzy logic-based and multiple linear regression (MLR) models were used to predict the treated wastewater volume from a multimedia filter under different influent flow rates and turbidities. As they explained, although the regression model had higher accuracy in predicting the treated wastewater, the fuzzy-based model, due to considering the uncertainties in input parameters, was more reliable (Hawari et al. 2017).

Support vector regression (SVR) is another supervised machine learning technique which can alleviate the limitation of ANNs. Unlike ANNs, SVR has a simple geometric interpretation, and also a few model parameters should be adjusted (Parveen et al. 2017). Some previous studies have been shown that SVR-based models can be superior to MLR and ANN in predicting the adsorption process (Parveen et al. 2017 and 2019). Li et al. (2021) used machine learning methods including SVR and Gaussian process regression (GPR) to determine the relationship between the hydraulic conditions and the efficiency of the flocculation process. They reported that the SVR model predicted the turbidity removal efficiencies, based on various hydraulic conditions, better than the GPR model. However, working with a large dataset due to memory requirement and determining the best kernel function are significant challenges to SVR (Zaghloul et al. 2020).

Unlike the physically based numerical models, which are applied to depict and quantify the relationships between different input–output variables, the artificial intelligence (AI) techniques are capable of considering the uncertainties and provide a fast and accurate way to determine the system responses without depicting the structures of the processes (Saadatpour et al. 2020).

Coagulation–flocculation is one of the most crucial water treatment processes known as an economic and robust method for destabilizing suspended and colloidal particles and removing turbidity from water (Metcalf and Eddy 2003; Onukwuli et al. 2021). The process is severely affected by operational and environmental factors such as pH, initial turbidity, coagulant dosage, mixing speed, process time, and temperature (Gupta et al. 2016; Aboubaraka et al. 2017). The variety of effective factors makes the process highly complicated. This raises the need for an efficient predictive model for the process (Zhu et al. 2021).

Due to the cost-effectiveness and more or less satisfying performance of conventional coagulants such as metal salts, they have been widely used in water treatment works. Nonetheless, the traditional coagulants have drawbacks such as generating large sludge volumes, inefficiency at low temperatures, and bringing about the Alzheimer’s disease, making them as a threat to human health and environment (Crini and Lichtfouse 2019; Nnaji et al. 2020; Ezemagu et al. 2021).

Graphene oxide (GO) is a two-dimensional carbon-based nanomaterial that due to its special surface properties and functional groups has recently been examined as a coagulant in water and wastewater treatment studies (Yang et al. 2013; Aboubaraka et al. 2017; Rezania et al. 2021). Moreover, GO is superior to other conventional coagulants in terms of biodegradability (Sanchez et al. 2012). However, due to the novelty of the subject and lack of studies in this field, there are very few reports regarding the modeling of the GO-based coagulation–flocculation process. Rezania et al. (2021) investigated the GO performance as a coagulant in turbidity removal from water and simulated the process through response surface methodology (RSM). Although RSM is appropriate for modeling quadratic processes and provides comprehensive information on sensitivity analysis and interaction of independent operating parameters (Igwegbe et al. 2021), not all nonlinear systems are necessarily well compatible with second-order polynomials (Bhatti et al. 2011). In addition, RSM modeling requires a predefined acceptable fitting function (Karthic et al. 2013) and a determining suitable range for each input parameter (Maran and Priya 2015).

According to the best knowledge of the authors, no reports have been published so far on modeling the GO-based coagulation–flocculation process using the aforesaid AI techniques and as well on comparing the models’ performance and determining the most appropriate technique for predicting the process efficiency. As discussed above, the successful applications of ANN, ANFIS, and SVR techniques have been reported in many environmental engineering problems, especially in predicting water and wastewater treatment processes. These data mining techniques represent priorities over conventional modeling, such as the strength to handle large amounts of noisy data even in dynamic and nonlinear frameworks, especially when the underlying physical, chemical, or biological process is not completely understood. All these factors, along with features such as generality, user-friendly, and ready-made apps, provided a strong incentive to evaluate and compare the performance of the aforementioned techniques in predicting the turbidity removal from water using graphene oxide (GO).

In addition, it is of high importance to note that the coagulation performance and mechanisms of GO nanoparticles are different from conventional coagulants because the coagulation properties of GO are due to its surface characteristics, while the coagulation properties of conventional coagulants are brought about from their hydrolysis in water. Therefore, determining an appropriate AI technique for modeling the GO-based coagulation–flocculation process will be valuable.

Given the reasons discussed above, the main objective of the present work was developing and comparing the capability of the aforementioned AI-based data mining models, i.e., ANN, SVR, and ANFIS, in predicting the GO performance as a coagulant in the removal of turbidity from drinking water. The prediction performance of the AI-based models was compared with each other and with the response surface methodology (RSM) model, previously reported by the authors (Rezania et al. 2021), as well. The experiments were performed using jar test instrument, and partial mutual information (PMI) algorithm was used to determine the appropriate input variables. The models’ prediction performance was compared using statistical indicators.

Methodology

Data collection

The data used for the development of the models have already been generated by the authors in a recent study (Rezania et al. 2021), in which they evaluated the GO performance as a novel coagulant in turbidity removal from water. The study was performed using single-layer GO with a layer thickness of 0.7–1.4 nm. For preparing turbid samples, garden soil particles passed through the sieve No. 200 were dispersed in 2 L tap water. In order to obtain a uniform dispersion, the stock suspension was first stirred at 100 rpm for 1 h and then left for 24 h for complete hydration of the particles. In the next step, the suspension was stirred again and allowed to settle for 60 min. The obtained supernatant was used to prepare samples with different levels of turbidity (Rezania et al. 2021). A six-paddle jar test apparatus was used for performing the coagulation–flocculation process. The turbid samples were first agitated at rapid mixing rate of 200 rpm for 2 min and then were slowly stirred at 50 rpm for 15 min at room temperature. The effect of pH (3–11), GO dosage (2.5–30 mg/L), initial turbidity (25–300 NTU), rapid mixing time (1–5 min), and slow mixing time (10–40 min) on the turbidity removal efficiency was evaluated through the mentioned jar test procedure. After performing 79 one factor at a time (OFAT) tests, the process was simulated through RSM (Rezania et al. 2021). A central composite design (CCD) containing 20 different combinations of experimental runs, with 8 star points, 6 axials, and 6 center points, was selected for building quadratic models. To reduce the experimental errors, all experiments were carried out in randomized order. The second-order polynomial model in coded form obtained by Rezania et al. (2021) is as follows:

$$Y \left(\%\right)=92.56+2.16 {X}_{1}-10.5 {X}_{2}+5.63 {X}_{3}-1.19 {X}_{1}{X}_{2}+4.52 {X}_{2}{X}_{3}-5.5 {{X}_{1}}^{2}-4.23 {{X}_{2}}^{2}-1.56{{X}_{3}}^{2}$$
(1)

Input variable selection

Selecting variables relevant to the target is one of the most important issues regarding the development of data-driven models. Furthermore, the performance of such models can be adversely affected if either too few or too many inputs are selected (Wu et al. 2014; Li et al. 2015a,b). Generally, input data selection in an environmental modeling context is a complicated issue due to a lack of understanding of the underlying physical–chemical and biological processes. For this reason, the partial mutual information (PMI) algorithm introduced by Sharma (2000) is commonly used to determine the appropriate input data.

The PMI value for the output variable y and the input variable x for a selected input data set {z} are calculated as follows:

$$\text{PMI} = \iint {\text{f}}\left({\text{x}}^{^{\prime}}\mathrm{,}{\text{y}}^{^{\prime}}\right){\text{log}}\mathrm{[}\frac{{\text{f}}\left({\text{x}}^{^{\prime}}\text{,}{\mathrm{y}}^{^{\prime}}\right)}{{\text{f}}\left({\text{x}}^{^{\prime}}\right){\text{f}}\left({\text{y}}^{^{\prime}}\right)}\text{]}{\mathrm{dx}}^{^{\prime}}{\text{dy}}^{^{\prime}}$$
(2)

where \({\text{x}}^{^{\prime}}\text{= x- E[x|z]}\),\({\text{y}}^{^{\prime}}\text{= y- E[y|z]}\), operator E[.] denotes the expectation operation, \({\text{f}}\left({\text{y}}^{^{\prime}}\right)\) and \({\text{f}}\left({\text{x}}^{^{\prime}}\right)\) are the marginal probability density functions (pdfs), and \({\text{f}}\left({\text{x}}^{^{\prime}}\mathrm{,}{\text{y}}^{^{\prime}}\right)\) is joint probability densities. As the greater the PMI score, the higher the effectiveness of the input variable on the response.

Based on the PMI analysis results (described in the third to the fifth section), among the evaluated parameters, GO dosage, pH, and initial turbidity were determined as the most effective independent variables for developing the models. This result is consistent with the explanations of Rezania et al. (2021) that changing the rapid and slow mixing times had a negligible effect on turbidity removal efficiency using GO. Similarly, Naeem et al. (2018) showed that because of the abundant active sites on GO-based nanocomposite, a significant adsorptive removal of contaminant particles occurred in the early minutes of the process, and increase in the contact time did not have much effect on process efficiency.

Data splitting

In this step, the available data is split into calibration (including training and test datasets if cross-validation is used) and validation data sets. Data splitting can be categorized as unsupervised and supervised methods (Maier et al. 2010). In the present study, random data splitting as the most commonly used unsupervised data splitting method was used (Mirri et al. 2020). As a result, the 79 experimental data obtained from OFAT tests was divided into calibration (73%) and validation (27%) sets and used to calibrate and validate the selected artificial intelligence models, i.e., ANN, ANFIS, and SVR. The validation data was also applied to Eq. 1 to calculate the predicted values of turbidity removal efficiency by the RSM model. Finally, all models were evaluated and compared with one another in terms of their performance in predicting turbidity removal from water using GO.

Artificial intelligence models

Artificial neural network (ANN)

ANNs based on their capability of learning from large-size data sets are applicable in predicting nonlinear functions (Samadi et al. 2021b). The original database should be large enough to be divided into calibration and validation sets, either using the supervised or unsupervised method (Maier et al. 2010). Training data are used during the learning process to find the pattern between variables and response(s).

Validation data is also utilized to evaluate network performance. In the present work, the feed-forward backpropagation neural network, which is one of the most popular ANN architectures developed by Rumelhart et al. (1986), was created in MATLAB 2020b mathematical software. In the backpropagation algorithm, when the output is firstly calculated, the difference between the obtained and the desired response is mapped; then, weights of the network are updated with the aim of minimizing the loss function. The number of hidden layers is a significant aspect of a neural network design since it can affect the accuracy of the response. As there is still no specific method for determining appropriate network architecture before the learning step, it is often done by trial-and-error process. Nevertheless, for the vast majority of problems, to avoid the risk of over-fitting, using one hidden layer with sufficient neurons is more reasonable than increasing the number of the hidden layers (Wu et al. 2015). For this reason, an ANN network with one hidden layer was used in this study. The network structure used in this paper (see Fig. 1) comprises three layers, which are illustrated completely in the Discussion section.

Fig. 1
figure 1

Structure of the developed ANN model

Adaptive neuro-fuzzy inference systems (ANFIS)

The adaptive neuro-fuzzy inference system (ANFIS), proposed by Jang (1993), is a composite of ANN and fuzzy inference system (FIS). Adaptive networks reduce the required time for processing large datasets by finding optimal network structure automatically. In this approach, input data functions such as weights and biases can be adapted in the training process, which leads to reduce the error rate. On the other side, FIS uses the “IF…THEN” rules as well connectors “OR” or “AND” to map inputs to output(s). Every FIS is consisting of three main parts: fuzzy rules, a database, and a reasoning mechanism (Onukwuli et al. 2021). The rule base of the FIS with two inputs (x1 and x2) and one output (f), based on the Takagi–Sugeno type (Takagi and Sugeno 1983; Anadebe et al. 2020), can be shown as follows:

$$\begin{array}{cc}\mathrm{Rule}1:\;\mathrm{if}\;{\mathrm x}_1\;\mathrm{is}\;{\mathrm A}_1\;\mathrm{and}\;{\mathrm x}_2\;\mathrm{is}\;{\mathrm B}_1&\mathrm{then}\;{\mathrm f}_1={\mathrm p}_1{\mathrm x}_1+{\mathrm q}_1{\mathrm x}_2\end{array}$$
(3)
$$\begin{array}{cc}\mathrm{Rule}2:\;\mathrm{if}\;{\mathrm x}_1\;\mathrm{is}\;{\mathrm A}_2\;\mathrm{and}\;{\mathrm x}_2\;\mathrm{is}\;{\mathrm B}_2&\mathrm{then}\;{\mathrm f}_2={\mathrm p}_2{\mathrm x}_1+{\mathrm q}_2{\mathrm x}_2\end{array}$$
(4)

where A1 and A2 and B1 and B2 are the fuzzy sets for input parameters x1 and x2, respectively, and p1 and q1 and p2 and q2 are the consequent parameters obtained by the least square method. The detailed descriptions of the general structure of the ANFIS (see Fig. 2) are expressed as:

Fig. 2
figure 2

ANFIS architecture

  • Layer 1: every node in this layer is an adaptive node that computes the membership value of an input variable. Generalized bell-shaped, Gaussian, trapezoidal-shaped, and triangular-shaped are some popular types of membership functions. If the Gaussian membership function (μ) is adopted, the output of the node is calculated as follows:

    $${\mathrm\mu}_{\mathrm{Ai}}=\exp\;\left[\frac{-0.5\;\left(\mathrm x-{\mathrm c}_i\right)}{{\mathrm{{\sigma}i}}_2}\right]$$
    (5)

    where \({\text{x}}\) is the input to node i and Ai is the linguistic variable and (σi,\({\text{c}}_{i}\) ) are premise parameters. Indeed, Fuzzification occurs in this layer.

  • Layer 2: in this layer, circle (fixed) nodes, labeled as Π, multiplies the incoming signals from the previous layer, which represent the firing strength:

    $$\begin{array}{cc}{\mathrm\omega}_i\,={\mathrm\mu}_{\mathrm{Ai}}\times{\mathrm\mu}_{\mathrm{Bi}}&\mathrm i=1,2\end{array}$$
    (6)
  • Layer 3: each node in this layer calculates the normalized firing strength as

    $$\begin{array}{cc}{\overline{\mathrm\omega}}_i=\frac{{\mathrm\omega}_{\mathit i}}{{\mathrm\omega}_{\mathrm i}+{\mathrm\omega}_2}&\mathrm i=1,2\end{array}$$
    (7)
  • Layer 4: the output of each node can be obtained by multiplying the normalized firing strength with the first-order Sugeno model as follows:

    $$\overline{{\mathrm\omega}_{\mathrm i}}\;{\mathrm f}_i=\overline{{\mathrm\omega}_{\mathrm i}}\left[{\mathrm p}_i\times{\mathrm x}_{\mathrm i}+{\mathrm q}_i\times{\mathrm x}_2\right]$$
    (8)
  • Layer 5: the single node in this layer computes the output of the model by Eq. 9:

    $$\sum\nolimits_{\text{i}}{\stackrel{-}{*}}_{\text{i}} \, {\text{f}}_{i} =\frac{\sum_{\text{i}}{*}_{i} \, {\text{f}}_{i}}{\sum_{\text{i}}{*}_{i} \, }$$
    (9)

Support vector regression (SVR)

Support vector regression (SVR) is a machine learning algorithm that applies some basic concepts of support vector machine (SVM) for complicated regression problems. In this study, ɛ-SVR technique, as the most widely LibSVM model, was used using the MATLAB 2020b platform. For a dataset {(xi,yi), i = 1, 2, ⋅ ⋅ ⋅, N}, where xi ε RN is the input and yi ε RN is the target; the SVR function mathematically can be shown as

$$\mathrm f\;\left(\mathrm x\right)=\mathrm\omega\times\mathrm\phi\left(\mathrm x\right)+\mathrm b$$
(10)

where ω is the parameter of the linear SVR, \({\text{b}}\) is the bias term, and ϕ(x) is a nonlinear mapping function. ω and \({\text{b}}\) can be estimated by minimizing the regression risk as follows:

$$\begin{array}{c}\mathrm{Minimize}:\left[\frac12{\Arrowvert\mathrm\omega\Arrowvert}^2+\mathrm c\;{\textstyle\sum_{\mathrm i=1}^{\mathrm N}}\;{\mathrm\xi}_{\mathrm i}+\mathrm\xi_{\mathrm i}^{\;\ast}\right]\\\mathrm{Subjected}\;\mathrm{to}:\;\left\{\begin{array}{c}\begin{array}{c}{\mathrm y}_{\mathrm i}-\;\mathrm f\;(\mathrm x)\;\leq\;{\mathrm\xi}_{\mathrm i}+\mathrm\varepsilon\\\begin{array}{l}\mathrm f\;(\mathrm x)\;-\;{\mathrm y}_{\mathrm i}\leq\;\mathrm\xi_{\mathrm i}^{\;\ast}+\;\mathrm\varepsilon\\\;\;\;\;\;\;{\mathrm\xi}_{\mathrm i},\mathrm\xi_{\mathrm i}^{\;\ast}\geq\;0\end{array}\end{array}\end{array}\right.\end{array}$$
(11)

where \({\text{c}}\) represents the penalty variable and \({\mathrm\xi}_{\mathrm i},{\mathrm\xi}_{\mathrm i}^\ast\) are slack variables. Using the “Lagrangian” function, the approximate function can be expressed by Eq. 12:

$$\mathrm f\;\left(\mathrm x\right)={\textstyle\sum_{\mathrm i=1}^{\mathrm N}}\left(\mathrm a+{\mathrm a}_1^\ast\right)\mathrm k\left(\mathrm x,{\mathrm x}_{\mathrm i}\right)+\mathrm b$$
(12)

where \(\alpha \text{+}{{\alpha }_{\text{i}}}^{*}\) are Lagrangian multipliers and \({\text{k}}\left(\text{x,}{\text{x}}_{\text{i}}\right)\) is the kernel function. In this work, the radial basis function (RBF) kernel was applied for constructing the SVR. The RBF kernel due to its high computational efficiency and capability of separating linear data is the most useful function (Nie et al. 2020), which is as follows:

$$\mathrm K\left({\mathrm x}_{\mathrm i},{\mathrm x}_{\mathrm j}\right)=\exp\;\left(-\mathrm\gamma\;\left\|{\mathrm x}_{\mathrm i}-{\mathrm x}_{\mathrm j}\right\|^2\right)$$
(13)

where \(\gamma\) is the kernel parameter. As the accuracy of the SVR model can be affected by the value of \(\gamma\), \({\text{c}}\), and \(\varepsilon\), the best values of them were determined by trial and error process.

Model evaluation

The goodness-of-fit of the developed models was assessed via statistical indices, including the mean-squared error (MSE), root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2). MSE can be obtained using Eq. 14:

$$\mathrm{MSE}=\frac{\sum_{\mathrm i=1}^{\mathrm n}\left({\mathrm y}_{\mathrm{act},\mathrm i}-{\mathrm y}_{\mathrm{est},\mathrm i}\right)^2}{\mathrm n}$$
(14)

where \({\text{y}}_{\text{act,i}}\) and \({\text{y}}_{\text{est,i}}\) represent the ith observed and estimated values of the efficiency of the turbidity removal, respectively, and n is the total number of input data. MSE varies from positive infinity to zero, such that the closer it is to zero, the better the fit of the model.

RMSE is the square root of the average of squared errors. This non-negative index indicates the best fit to the data in the value of zero (never happen in practice). RMSE is formulated as follows:

$$\mathrm{RMSE}=\sqrt{\frac{\sum_{\mathrm i=1}^{\mathrm n}\;\left({\mathrm y}_{\mathrm{act}}-{\mathrm y}_{\mathrm{est}}\right)^2}{\mathrm n}}$$
(15)

MAE, shown in Eq. 16, is another proper error index statistic representing the average absolute difference between the predicted and the observed values. Similarly, MAE values near zero show a relevant result of the model:

$$\mathrm{MAE}=\frac{\sum_{\mathrm i=1}^{\mathrm n}\left|{\mathrm y}_{\mathrm{act}}-{\mathrm y}_{\mathrm{est}}\right|}{\mathrm n}$$
(16)

The coefficient of R2 indicates the ability of the model to approximate the actual data points, which varies between zero and one. The more the coefficient of determination, the better the fit of the model to the data. R2 is calculated by the following relation:

$$\mathrm R^2=\frac{\left[\sum_{\mathrm i=1}^{\mathrm n}\;\left({\mathrm y}_{\mathrm{act}}-{\overline{\mathrm y}}_{\mathrm{act}}\right)\left({\mathrm y}_{\mathrm{est}}-{\overline{\mathrm y}}_{\mathrm{est}}\right)\right]^2}{\sum_{\mathrm i=1}^{\mathrm n}\;\left({\mathrm y}_{\mathrm{act}}-{\overline{\mathrm y}}_{\mathrm{act}}\right)^2\sum_{\mathrm i=1}^{\mathrm n}\left({\mathrm y}_{\mathrm{est}}-{\overline{\mathrm y}}_{\mathrm{est}}\right)^2}$$
(17)

where \({\overline{y} }_{\text{act}}\) and \({\overline{y} }_{\text{est}}\) denote the ith measured and predicted values of the turbidity removal efficiency, respectively.

The relative error is another indicator for assessing the models’ accuracy in predicting responses. Based on the formula represented in Eq. 18, the lower the value of the relative error, the higher the accuracy of the proposed model:

$$\mathrm{RE}=\frac{{\mathrm y}_{\mathrm{act}}-{\overline{\mathrm y}}_{\mathrm{est}}}{{\mathrm y}_{\mathrm{act}}}\times100$$
(18)

Results and discussion

Assessment of the ANN model

The proposed optimal ANN structure consists of an input layer (representing the most appropriate variables, i.e., GO dosage, pH, and initial turbidity), one hidden layer, and one output layer as the network’s response (turbidity removal efficiency). Additionally, “Tansig” and “Purelin” transfer functions were employed at the hidden and the output layers, respectively. In order to avoid over-fitting of the model, a program was developed in MATLAB 2020b software to find the optimum number of neurons and to automatically provide the best network training and learning functions, as well. As a result, a network with three neurons in the hidden layer, the “Trainbr” as learning function, and the “learnlv1” as the training function indicated the most accurate response compared with the other architectures.

Figure 3a and b show the modeling results of the efficiency of the turbidity removal from water utilizing GO in the lab-scale water treatment process. The scatter plot of observed and predicted turbidity removal values for calibration data is displayed in Fig. 3a. The high value of R2 (0.9129) shows an excellent performance of the model. Additionally, the coefficient of determination (R2) between the actual values (results obtained in the laboratory) and the predicted values of the validation data estimated through ANN is equal to 0.9492, which indicates that the model reasonably offers a good fit (see Fig. 3b). The high coefficients of R2 in Fig. 3 prove that the calibration and validation processes of the developed ANN model have been accomplished well. Similar results have been reported by Onukwuli et al. (2021) who simulated the dye-polluted wastewater decontamination using bio-coagulants via ANN model. They fed the model with 100 experimental data, 70% of which was used for training and the rest for validation and testing processes. According to their results, the developed ANN model predicted the process with very high accuracy (Regression coefficient R2 = 0.9999) due to its ability to approximate all types of structures (Onukwuli et al. 2021). Zangooei et al. (2016) simulated the coagulation–flocculation process with poly aluminum chloride (PAC) as coagulant, to predict the water turbidity after the process. They considered three independent variables including pH, PAC dosage, and influent turbidity for modeling the process using multi-layer neural network. As they described, their ANN model had the ability to predict the effluent turbidity with a high coefficient of determination (R2 = 0.96) during testing the model (Zangooei et al. 2016). It is noteworthy that Zangooei et al. (2016) used 236 experimental data, of which 85 percent was used for training, and the rest was used for the testing of the network. It is therefore interesting that using much lower number of data in the present study (79 data) and only 73% of the data for training the model, the ANN technique still obtained outstanding results in terms of the coefficient of determination for both the calibration (R2 = 0. 9129) and the validation (R2 = 0. 9492) processes. Such observation may be supported by the fact that ANN as a black box model focuses mainly on the analysis of the available data and simulation of any nonlinear equation (Golbaz et al. 2020).

Fig. 3
figure 3

Coefficient of determination between observed and predicted values of the calibration (a) and the validation (b) data for ANN model

Assessment of the ANFIS model

ANFIS calibration and validation performances are presented in Fig. 4a and b, respectively. With regard to the coefficient of determination equal to 0.936 for calibration of ANFIS model (Fig. 4a), it can be concluded that the developed model has a suitable performance in approximating the turbidity removal efficiency using GO as a coagulant. Moreover, the coefficient of R2 (0.877) for validation data denotes the effectiveness and the reliability of the proposed model for extracting features from input data (Fig. 4b).

Fig. 4
figure 4

Coefficient of determination between observed and predicted values of the calibration (a) and the validation (b) data for ANFIS model

Similarly, Taheri et al. (2013) pointed out that ANFIS model successfully predicted the electrocoagulation–coagulation process with R2 value of 0.923 for a total of 78 test and train data. Also, some previous investigations recommended ANFIS as a powerful tool for modeling of adsorption process (Khomeyrani et al. 2021; Hanumanthu et al. 2021). Heddam et al. (2012) used ANFIS for modeling of coagulant dosage in a water treatment plant. As they described, the developed subtractive clustering-based ANFIS model provided accurate and reliable coagulant dosage prediction. The qualitative human judgment and expert knowledge, dependency of input variables, absence of mathematical models, and nonlinearity of relationships are the conditions making ANFIS a favorable modeling method for the processes such as coagulation–flocculation which involve many complex physical and chemical phenomena (Heddam et al. 2012; Hawari et al. 2017).

Assessment of the SVR model

The coefficient of determination between the predicted values and the calibration data is shown in Fig. 5a. Regarding the correlation coefficient of more than 0.7, SVR had an acceptable performance in the calibration process. Similarly, R2 of 0.864 between the observed and the predicted values in the validation process pointed out the applicability of SVR for predicting the turbidity removal from water using GO (see Fig. 5b). This may be attributed to the fact that although other traditional regression models use the empirical risk minimization principle (ERM) to minimize the training error, SVR, using the structural risk minimization (SRM) principle, considers the capacity of the learning machines, which leads to optimizing the generalization accuracy (Parveen et al. 2017).

Fig. 5
figure 5

Coefficient of determination between observed and predicted values of the calibration (a) and the validation (b) data for SVR model

Parveen et al. (2017) reported high correlation coefficient of R = 0.9986 for predicting the adsorptive removal of Cr(VI) ions from wastewater via SVR-based model they developed using a whole data set of 124 samples (80% as training and 20% as test datasets). According to another study conducted by Parveen et al. (2019), the SVR-based model accurately predicted the adsorptive removal of Ni(II) ions from wastewater, with high correlation coefficient (R) of 0.993. They used a whole dataset of 382 samples partitioned into two parts as the training (80%) and the test (20%) datasets. Additionally, Zaghloul et al. (2020) proved that SVR technique, thanks to the penalty placed on the prediction errors, predicted the aerobic granular process with high accuracy (R2 of 0.99 for validation data). They fed their SVR model with 2920 experimental data, 89% of which was used for training and the rest for validation processes. Comparison of the results of the present study with previous studies shows that the prediction performance of the SVR technique depends not only on the type of problem, but also on availability of a sufficiently large data set.

Comparison of the data-driven models

Table 1 shows satisfactory relationship between the experimental results and the predicted data proposed by RSM, ANN, ANFIS, and SVR models through validation process. Statistical measures including MSE, RMSE, MAE, and R2 were used to precisely compare the capability of the models in predicting the turbidity removal using GO. The results are given in Table 2. As seen in the table, the predicted values of all the models correlated very well with the observed results (R2 > 0.86). Generally, correlation of R2 greater than 0.8 and low relative errors between measured and predicted values prove strong model performance (Kennedy et al. 2015).

Table 1 Experimental observations and the models’ predicted values in validation process
Table 2 Statistical evaluation parameters for the validation data

However, the highest R2 of the validation process and the lowest values of MSE, RMSE, and MAE indicators were obtained for the ANN model. While the values of the validation R2 for the ANFIS, SVR, and RSM models were more or less similar and in the range of 0.864–0.877, this parameter was remarkably higher and about 0.95 for the ANN model. It means ANN model was the most accurate model in approximating the effect of GO on turbidity removal from water, thanks to its capability of predicting multiple complex and nonlinear functions. Besides, ANN technique is flexible in terms of adding new experimental data to build a reliable and accurate ANN model without requiring a standard experimental design (Geyikçi et al. 2012; Maran and Priya 2015).

The results are consistent with some previous reports. According to Zangooei et al. (2016), ANN model outperformed fuzzy regression analysis in simulating the coagulation–flocculation process and predicting effluent turbidity under different experimental conditions (pH, influent turbidity, and PAC concentration). As they described, the R2 of the validation process for the ANN model was 0.96, while it was 0.93 for the fuzzy regression analysis with quadratic function. In addition, Maran and Priya (2015), Golbaz et al. (2020), Onu et al. (2021), and Onukwuli et al. (2021) proved the superiority of the ANN over the RSM model due to the higher deviation of the predictions of the RSM while insignificant residual values of the ANN. However, Igwegbe et al. (2019) who simulated the adsorptive removal of methylene blue dye by ANN and RSM techniques, both using the same experiments planned through the CCD (21 runs), explained that due to the very limited number of experimental runs, the prediction performance of ANN model was less acceptable compared with RSM model. Similar results were also reported by Uzoh and Onukwuli (2017), who compared prediction performance of ANN and RSM models, both developed using the same 30 experiments designed by RSM. Uzoh and Onukwuli (2017) described that ANN generally performs better when very large number of data points is used for training the network. Therefore, it can be concluded that in a situation where very little data can be provided and used, the RSM technique will make a more accurate prediction than the ANN method.

As represented in Fig. 5 and Table 2, the SVR model had the lowest values of the R2 for both calibration (0.7119) and validation (0.8643) datasets and the highest MSE, RMSE, and MAE values, as well. Accordingly, the SVR model exhibited the weakest performance in predicting the GO-based coagulation process, among the evaluated techniques. This result is inconsistent with those reported by Parveen et al. (2019) who explained that the SVR-based model predicted the adsorptive removal of Cr(VI) ions from wastewater with a higher correlation coefficient than the ANN model (R = 0.9986 and 0.9331, respectively). These contradictory results point to the fact that the performance of the modeling techniques depends substantially on the type of problem and also confirm the importance of determining the appropriate method for modeling each process.

The ANFIS model represented the highest R2 for the calibration process. This indicates that ANFIS had the best performance in the training step. In addition, for both ANN and SVR models, the value of R2 for the validation data was larger than that of the calibration data, while this was vice versa for the ANFIS model as it showed smaller R2 for the validation than the calibration data. This is because the ANFIS model considers the uncertainties of the input/output data and experimental conditions, which helps the model provide a more appropriate drawing of the actual process. Zangooei et al. (2016) who used ANFIS technique for modeling the turbidity removal using PAC as the coagulant similarly reported a larger R2 for the training than the testing dataset.

Figure 6 shows the observed data alongside the estimated values of the turbidity removal efficiency for the proposed models during the validation process. In order to better interpret the figure from the viewpoint of the characteristics of errors of each model, the relative errors of the predictions generated by the models were measured according to Eq. 17 and plotted versus the observed data in Fig. 7a. It should be noted that the relative error is indefinite when the actual value is zero as it appears in the denominator (Chen et al. 2017). For this reason, the second point of Fig. 6 in which the observed removal efficiency was equal to zero was not represented in Fig. 7a. As seen in Fig. 7a, for all the models, the largest relative errors were obtained for the smaller observed data (i.e., the lower efficiencies). It can also be found that as the amount of the observed data increases, the relative errors approach zero. Obtaining smaller relative errors for the larger observed data (i.e., the higher efficiencies) and larger relative errors for the smaller observed data could be, respectively, attributed to the abundance of the larger observed data and the low number of the smaller observed data. In fact, the models were better trained and calibrated for the larger observed values. It is noteworthy that the very good performance of the GO under the most of the tested conditions led to the abundance of the larger observed values.

Fig. 6
figure 6

Comparison of the measured and the predicted values for the validation data set

Fig. 7
figure 7

Relative errors versus the observed data (a) and the frequency distribution (b) for the validation data set

Figure 7b shows the relative errors versus the frequency distribution for each model based on the validation data set. According to the obtained results, the SVR model led to larger errors compared with the two others. As depicted in Fig. 7b, only 57.1% of the predictions generated by the SVR model had a relative error less than 10%. However, 76.1%, 71.4%, and 66.6% of the results generated by ANN, RSM, and ANFIS models, respectively, were characterized by the same relative error. Moreover, about 62% of the results obtained by the SVR model revealed a positive relative error, indicating an obvious tendency of the model to underestimate the observed data. This was vice versa for the ANN model, which overestimated 62% of the experimental results. The performance of the ANFIS and RSM models were better in this regard as they predicted the observed data with more normal error distributions than the ANN and SVR models. Based on the results, 47.7% and 52.3% of the predictions generated by the ANFIS and RSM models, respectively, suffered from a negative relative error.

Generally, all the models provided accurate and reliable turbidity removal predictions and could minimize the dependency on knowledge of the physicochemical properties of the processes. However, the characteristics of the ANN technique, such as the ability to learn nonlinear functions with complex relationships, not stopping the output approximation in case of the corruption of one or more cells of the ANN (fault tolerance), and generalizing and inferring unseen relationships on unobserved data, made this technique superior than the other models in predicting the process efficiency. Nevertheless, the ability of the ANFIS model to take into account the uncertainties in the approximation process cannot be ignored, and therefore, it is recommended to use this AI technique for modeling the processes with considerable uncertainties in the input/output data and the experimental conditions. Additionally, despite the fact that the predictions of the RSM model were less accurate than the ANN and ANFIS models, use of this modeling technique, even before developing the artificial intelligence models, is essential for understanding the nature of the process and obtaining useful information about the contribution of independent variables and their complex interactions.

Identification of the input parameters using PMI

The degree of importance of input variables used in the data-driven models on the desired output was investigated using the PMI algorithm. The higher the PMI score for the identified variable, the greater the effect of that parameter on the response. In order to address the impacts of physical and chemical issues on turbidity removal efficiency, the input parameters including pH, GO dosage, initial turbidity, and rate of slow and rapid mixing steps were considered. The results indicated that pH, GO dosage, and initial turbidity were the most important parameters, affecting the water turbidity removal using GO as a coagulant. The PMI scores obtained for pH, GO dosage, and initial turbidity (as input parameters for developing the data-driven models) were 0.72, 0.608, and 0.415, respectively, which showed that pH was the most effective input parameter on the process efficiency. It is noteworthy that the results obtained by the PMI algorithm were in line with those reported by Rezania et al. (2021), who simulated the process through RSM and found pH and GO dosage orderly as the most effective parameters on the process.

Conclusions

In the present study, three different artificial intelligence models consisting of ANN, ANFIS, and SVR were developed for predicting the turbidity removal efficiency using GO as a coagulant and then compared with each other and with the results obtained by the RSM model, previously reported by the authors, as well. The ability of the models to predict the process efficiency was compared using statistical indices. All the models successfully approximated the behavior of the process, based on their high coefficients of determination (R2 > 0.86) for validation process. However, the highest validation R2 and the lowest values of MSE, RMSE, and MAE indicators were obtained for the ANN model (0.949, 32.61, 5.71, and 4.22, respectively), indicating the superior performance of this AI technique than the other techniques in predicting the process efficiency. In contrast, the SVR model represented the weakest prediction performance with the lowest validation R2 of 0.864. It was also found that the ANN model predicted the observed data with low error margins as 76.1% of predictions performed by this technique had relative errors (RE) of less than 10%. However, only 57.1% of the predictions generated by the SVR model were characterized with RE < 10%.

According to the results, ANN was distinguished as the most appropriate technique for modeling the process. However, simulating the process using RSM technique is also recommended as it helps to understand the nature of the process and the interaction effects of the independent variables, using the least number of the experiments.

For future research works, it is recommended to train the algorithms developed in the present work using more experimental data, in order to expand the applicability of the models. Moreover, using other machine learning algorithms for modeling the process and comparing the results with the present study will be of great value.