Introduction

With the world’s rapid population growth and intense industrialization in the twentieth century, environmental pollution has become a global problem with adverse impacts on the water sector [1]. The vast majority of the remaining pollution issues are caused by heavy metals and persistent organic compounds because of their resistance to conventional treatments such as physico-chemical or biological methods. This results in the detection of refractory pollutants in rivers, lakes, oceans and even drinking waters all over the world [2, 3]. These compounds can cause hazardous health effects on living organisms, including human beings. Therefore, advanced water and wastewater treatment have become a primary social, political, and environmental concern [4,5,6].

In recent years, electrochemical processes, considered as eco-friendly and green technologies, have been gaining attention as an alternative method for water and wastewater treatment. This is because of their ability to remove persistent organic pollutants. In the case of pharmaceutical pollutants, for example, it has been demonstrated that electrochemical advanced oxidation processes are significantly more efficient than biological methods [7]. They benefit from attractive advantages including versatility, high energy efficiency, amenability to automation, and cost-effectiveness [8, 9]. On the other hand, drawbacks include the need for conductive wastewater, the formation of organic substances on the surface on the electrode that reduce its electrode active surface, and the potential formation of harmful intermediate by-products. Several publications focusing on different electrochemical methods such as electrooxidation, electrocoagulation, electroflotation, electro-Fenton, and electrodialysis have been published within the last decade for improving the treatment performance of wastewaters and drinking waters [10,11,12,13].

To make these technologies competitive with other conventional technologies, proper process and operating conditions design via process modelling and optimization are critical. Process optimization necessitates process modelling. For water and wastewater treatment processes, phenomenological and empirical modelling approaches are commonly used [14,15,16]. Due to the intricate interactions between the input and output variables, electrochemical techniques for treating water are highly complicated, nonlinear systems. This is because in an electrochemical system, numerous mechanisms frequently occur concurrently and in a non-additive way. For instance, precise mechanisms of charge transfer, electrochemical kinetics, thermodynamics, adsorption isotherms and kinetic models, flocculation, flotation, settling, and complexation should be understood in the context of electrocoagulation processes. Additionally, in electrooxidation, each compound's concentration in an electrochemical cell is influenced by time and space, or more specifically, by how far away it is from the electrode surface. Partial differential equations, which are frequently challenging to solve and have several model parameters, are in theory recommended to describe the profile of compounds under these circumstances. The number of species included in the model determines how complex the model is. All essential species in an electrochemical cell are to be taken into account, leading to a large multivariable model. This, however requires knowledge on reaction pathways to account for subsequent formations and transformations [17]. One of the alternative modelling methods for phenomenological modelling is empirical (regression) modelling [18]. The quadratic linear regression model is typically chosen, however, it is often insufficient to capture the nonlinearities of the systems. Therefore, modelling, simulating, and optimizing the processes using either phenomenological or conventional empirical models is not necessarily the best course of action. Artificial intelligence (AI) methods such as artificial neural networks (ANNs), adaptive neuro-fuzzy inference system (ANFIS), support vector machines (SVM) along with genetic algorithms (GA) and particle swarm optimization (PSO) methods have emerged as attractive alternative approaches for modelling and optimization of these nonlinear processes in case phenomenological or conventional regression models are not practical [19].

In this work, applications of artificial intelligence techniques in modelling electrochemical processes for water and wastewater treatment processes are discussed. To make AI modelling approach performance competitive to other conventional modelling approaches usually used (e.g., response surface methodology), it is important to build robust and reliable AI models. While the trend to use AI models is increasing in different fields of science, including electrochemical processes, the lack of attention to reliability and robustness of these models can have a negative impact on the progression of this field. Therefore, in addition to discussing the common AI techniques used in electrochemical processes for water and wastewater treatment, efforts were made to review and summarize the current knowledge of the literature on the scope of reliability and robustness of these models. As such, this review aims to shed light on the black-box modelling aspect of these data-driven models hoping that it could fill the gaps in reliability and interpretability of these techniques applied to electrochemical processes in water and wastewater treatment. Furthermore, several review studies have focused only on the application of ANNs as an AI technique in water treatment or other chemical processes [20,21,22,23]. In this study, we attempted to broaden the scope of our research to include other AI techniques such as ANFIS, SVR, and metaheuristic optimization algorithms such as GA and PSO. The reasons mentioned above were the main motivation of the authors for this paper since, to the knowledge of the authors, there is no specific review for this particular subject.

This review starts with a chapter describing the data sets of electrochemical processes for water and wastewater treatment. Then, the common AI techniques applied in the field with their applications will be presented. The optimization of hyperparameters, techniques to prevent overfitting, and sensitivity analysis for interpretation of the developed models for electrochemical processes are provided in chapter 4. Finally, a discussion of conclusions, challenges and future perspectives is presented.

Data sets

Electrochemical processes

Most of the data sets in published articles derive from four electrochemical processes: electrooxidation, electrocoagulation, electro-Fenton, and electrodialysis. In this section, a brief explanation of these electrochemical processes along with their AI modelling applications for water and wastewater treatment processes has been reviewed.

Electrooxidation

Municipal wastewater treatment plants (MWWTP) are not able to completely remove persistent organic pollutants, pesticides, and pharmaceuticals. Hence, their persistence in the effluent is of particular importance because it can increase the risk of long-term exposure, responsible for chronic toxicity and subtle effects on animals, plants and the aquatic environment [24, 25].

Electrochemical oxidation is a promising advanced oxidation technique for treating various wastewaters polluted by persistent organic compounds [16, 26,27,28,29,30]. Since it combines chemistry (generation of in situ oxidants) and electricity (electron transfer), it is an environmentally friendly technology [31]. Electrochemical oxidation occurs based on two different mechanisms:

  1. i.

    direct oxidation: hydroxyl radicals (\({\mathrm{E}}^{^\circ }{(\mathrm{OH}}^{^\circ }/{\mathrm{H}}_{2}\mathrm{O}\)) = 2.80 V vs. SHE) are produced at the electrode surface by the oxidation of water molecules (Eq. 1), and organic compounds can be completely mineralized (electrochemical combustion) or degraded (electrochemical conversion) by reacting with absorbed \({\mathrm{OH}}^{^\circ }\) radicals [32] (Eq. 2).

    $$\mathrm{M}+{\mathrm{H}}_{2}\mathrm{O}\to \mathrm{M}\left({\mathrm{OH}}^{^\circ }\right)+{\mathrm{H}}^{+}+{\mathrm{e}}^{-}$$
    (1)
    $${\mathrm{M}(\mathrm{OH}}^{^\circ })+\mathrm{Organics}\to \mathrm{M}+\mathrm{Oxidized\; products}$$
    (2)
  2. ii.

    indirect oxidation: other radical systems can be promoted by the generation of different oxidant mediators in the bulk solution, such as \({\mathrm H}_2{\mathrm O}_2,\mathrm{HClO}\;\mathrm{and}\;{\mathrm S}_2\mathrm O_8^{2-}\) [33, 34].

Table 1 summarizes the application of artificial intelligence (AI) modelling approaches of electrochemical oxidation for water and wastewater treatment processes.

Table 1 Application of AI modelling of electrochemical oxidation for water and wastewater treatment processes

Electrocoagulation

Electrocoagulation (EC), developed from chemical coagulation, produces coagulant agents (\({\mathrm{Fe}}^{2+}\)/\({\mathrm{Fe}}^{3+} \mathrm{or }{\mathrm{Al}}^{3+}\)) in-situ to effectively remove pollutants by deposition on the cathode or by floatation caused by the generation of hydrogen gas at the cathode [48]. The following equations describe the main reactions occurring in an EC cell:

$$\begin{array}{cc}\mathrm{At}\;\mathrm{the}\;\mathrm{anode}:&M_{\left(s\right)}\rightarrow M_{(aq)}^{n+}+{ne}^-\end{array}$$
(3)
$$\begin{array}{cc}\mathrm{At}\;\mathrm{the}\;\mathrm{cathode}:&{2H}_2O+{2e}^-\rightarrow{2OH}^-+H_2\end{array}$$
(4)
$$\begin{array}{cc}\mathrm{In}\;\mathrm{the}\;\mathrm{bulk}\;\mathrm{solution}:&M_{(aq)}^{n+}+n{OH}^-\rightarrow{M(OH)}_{n(s)}\end{array}$$
(5)

where M(s) is the metal, \({\mathrm{M}}_{(\mathrm{aq})}^{\mathrm{n}+}\) refers to the metallic ion (iron or aluminum ion), \({\mathrm{M}(\mathrm{OH})}_{\mathrm{n}(\mathrm{s})}\) represents the metallic hydroxide, and \({\mathrm{ne}}^{-}\) is the number of electrons transferred in the reaction at the electrode. It is worth mentioning that Eq. 5 describes a simple case of metallic hydroxide formation. In fact, depending on the pH and the type of metal involved, the formation of different metallic complex species is possible [49].

EC has several advantages over chemical coagulation, such as easy automation, low salinity of the effluent after treatment, low footprint, and reduced production of solid residuals [4]. The EC process has been widely studied for environmental applications to treat drinking water, urban wastewater, textile wastewater, restaurant wastewater, refractory oily wastewater, and heavy metal-containing wastewaters [50,51,52,53,54,55].

There are a number of studies regarding the application of artificial neural networks for modelling wastewater treatment by electrocoagulation processes (see Table 2).

Table 2 Application of ANNs for modelling of wastewater treatment by electrocoagulation

Electro-Fenton

The Electro-Fenton (EF) process is an indirect electrochemical advanced oxidation process since hydroxyl radicals are not generated directly from charge transfer at the electrode level but in the solution from the well-known Fenton reaction. To overcome the drawbacks of the classical Fenton process, the electro-Fenton process has been developed. Furthermore, it improves the degradation efficiency of the target pollutants [70]. Fenton's reagent, a mixture of H2O2 and Fe2+, is applied externally to the solution to be treated in the Fenton process to produce homogenous hydroxyl radicals (˙OH) [71]. Conversely, Fenton's reagent is electrochemically produced in situ at the cathode in the EF process (Eq. 6). The method relies on the electrochemical reduction of oxygen at the cathode to continuously produce hydrogen peroxide (H2O2) in an acidic medium (Eq. 7). Also, ferric cations (Fe3+) are reduced, and Fe2+ is formed (Eq. 8). At the anode, by the oxidation of water, oxygen is produced (Eq. 9) [72, 73].

$${\mathrm{Fe}}^{2+}+{\mathrm{H}}_{2}{\mathrm{O}}_{2}\to {\mathrm{Fe}}^{3+}+ \dot{}\mathrm{OH}+{\mathrm{OH}}^{-}$$
(6)
$${\mathrm{O}}_{2}+{2\mathrm{H}}^{+}+2{\mathrm{e}}^{-}\to {\mathrm{H}}_{2}{\mathrm{O}}_{2}$$
(7)
$${\mathrm{Fe}}^{3+}+{\mathrm{e}}^{-}\to {\mathrm{Fe}}^{2+}$$
(8)
$$2{\mathrm{H}}_{2}\mathrm{O}\to {{\mathrm{O}}_{2}+4\mathrm{H}}^{+}+ {4\mathrm{e}}^{-}$$
(9)

EF has been widely applied to the treatment of organic pollutants in water and wastewater. These studies include pharmaceuticals [74,75,76], dyes and textile wastewaters [77, 78], endocrine disrupting compounds [79], pesticides [80], polycyclic aromatic hydrocarbons [81], surfactants [82] and landfill leachates [83].

Applications of ANNs, as the only AI technique used for water and wastewater treatment using the EF process are presented in Table 3.

Table 3 Applications of ANNs for the water and wastewater treatment using the EF process

Electrodialysis

With the help of an electrical potential difference utilized as a driving force, electrodialysis (ED) provides an electrochemical method that removes ionic contaminants from an aqueous solution. As a result, two new solutions are generated: one is an ion concentrate, and the other is nearly pure water. In electrodialysis, the inherent properties of the ion exchange membrane, operating conditions, and physicochemical characteristics of the metal ions all have an impact on the effectiveness of ion separation [89, 90]. Because of its high chemical stability, flexibility, and high ionic conductivity thanks to its strong ionic characteristics, ED has been widely used for the treatment of industrial wastewaters, production of drinking and processed water from brackish water and seawater, recovery of useful materials from effluents, recovery of heavy and carcinogenic metals, and salt production [91,92,93,94,95].

Table 4 summarizes the applications of ANNs, as the only AI model used for water and wastewater treatment with the ED process.

Table 4 Applications of ANNs for water and wastewater treatment with ED process

Size of data sets

Data-driven AI techniques highly depend on the quantity and quality of the data sets fed into them. In other words, it is required to have enough reliable data to reasonably capture the relationships both between input variables and between input and output variables. It should be mentioned that the size of data sets required for machine learning approaches depends on the complexity of the problem and complexity of the learning algorithm, and there are no in advance certainties about the amount of data required for these approaches. Since data used for modelling and optimization of electrochemical processes for water and wastewater treatment processes are derived mainly from experimental studies, acquiring sufficient large data sets requires a considerable amount of time and resources. Figure 1 shows the distribution of the number of samples in data sets used in the field in literature. As can be seen in Fig. 1, most of the studies have implemented AI techniques with a relatively low number of samples (< 150) in data sets. Hence, it can be concluded that considering the amount of data available, most of the effort should be focused on the reliability and robustness of the AI models derived from these data sets.

Fig. 1
figure 1

Frequencies of articles in the literature regarding the size of the data sets

However, to overcome the limitation of the low number of data in AI modelling based on the experimental results, some authors have proposed using data augmentation techniques such as interpolation [47]. When insufficient data sets are not enough to learn many of the parameters of learning algorithms, it will cause overfitting meaning that the generalization of the model is unreliable. In order to solve this problem, more data needs to be collected. Still, in actual applications, additional data collection is often complex for various reasons, such as time and cost limitations. Data augmentation is a solution to address this [101]. Although data augmentation techniques have been applied to machine learning in different fields in literature, especially image processing and speech recognition [102,103,104], one should be cautious about using these techniques for the goal of regression of experimental work with limited data. This is because the behaviour of outputs in experimental studies can be much more complicated than describing them with predefined interpolation functions, which obviously would not be too hard for the AI model to predict the behaviour of the new interpolated data.

Data preprocessing

Experimental data obtained in electrochemical processes are used by AI models as inputs and outputs. Those independent and dependent experimental variables will be used as inputs and outputs, respectively. While various independent variables have been used in electrochemical processes, Fig. 2 shows the most common ones specified as inputs. As can be seen, electrolysis time and applied current have been the most frequent input variables for AI modelling of electrochemical processes. Other independent variables with a low number of frequencies used as inputs were feed flowrate, temperature, mixing speed, nature of the electrolyte and type of the pollutant.

Fig. 2
figure 2

Frequencies of different independent variables used as AI model inputs in literature

Feature scaling has often been used to scale the variables in the dataset. If the input and output variables are not of the same order of magnitude, some variables may appear to have more significance than they actually do. The training algorithm has to compensate for order-of-magnitude differences by adjusting the network weights, which is not very effective in many of the training algorithms (i.e., the backpropagation algorithm in ANN) [105].

Normalization and standardization have been utilized as feature scaling techniques in the reviewed studies. In the common normalization technique, so-called Min–Max scaling, values are shifted and rescaled to end up ranging between 0 and 1 [21]. In standardization, values will be centred around the mean with a unit standard deviation meaning that the mean of the feature becomes zero. At the same time, the resultant distribution has a unit standard deviation [41].

Performance evaluation

To evaluate the performance of the ANNs, there are different prediction accuracy criteria used in the literature [21, 106]. The most utilized criteria in the publications studied in this review for the performance evaluation of the models are listed in Table S1.

AI Techniques

AI techniques in literature applied to electrochemical processes for water and wastewater treatment processes are reviewed in this section. These include ANNs, SVM, ANFIS and metaheuristic algorithms.

ANNs

Multilayer perceptrons (MLP) feedforward neural networks are the type of ANNs that have been utilized frequently for modelling electrochemical processes (a description of ANNs can be found in the SI file). Single hidden layer MLP networks have been considered sufficient enough to correlate inputs to outputs in most of the electrochemical processes studied for water and wastewater treatment (e.g., [37, 39, 41, 57, 66, 87]. Soloman et al. [36] developed an ANN model to predict the electrooxidation of malachite green, a triphenyl methane dye, based on experimental data collected in a batch electrochemical reactor. A three layer back-propagation network with 3:9:1 configuration of was found adequate to predict the COD removal efficiency with R = 0.9987 and RMSE = 1.1428 (mean experimental value = 61.25). Also, Daneshvar et al. [56] showed the effectiveness of a three layer 7:10:1 neural network model to describe the behaviour of an electrocoagulation system for the colour removal from a textile dye solution containing C. I. BY28.

Multiple hidden layers instead of a single hidden layer were also considered for correlating inputs to outputs [42, 43, 97, 98]. Manokaran et al. [42] also used a feedforward back-propagation ANN model to predict the degradation of a distillery effluent by electrooxidation. They showed that a four layer 3:3:3:1 BP neural network had the best performance for COD removal: RMSE = 0.8633, AARE = 3.4613, R = 0.9987 compared to other configurations. Comparing regression and ANN models, Radwan et al. [86] showed that the ANN model performs slightly better (\({\mathrm{R}}_{\mathrm{regression}}^{2}\)=0.9525, \({\mathrm{R}}_{\mathrm{ANN}}^{2}\)=0.9742) for modelling an EF process for the treatment of phenolic wastewater.

While the previous studies examined a single optimum network for predicting the process outputs, some authors suggested using multiple networks or so-called stacked neural networks as an ensembling method. Stacked neural networks as an ensemble machine learning technique that have been used in other supervised methods such as SVM, k-nearest neighbours algorithm or decision trees [107], are based on the premise that the use of multiple networks, instead of simply just one single network, can be an optimal network and improved predictions can be obtained [108]. Thus, combining the outputs of different models which each capture certain aspects of the process and aggregating their information, can provide more accurate predictions (Fig. 3).

Fig. 3
figure 3

Schematic of the ensemble machine learning technique

Piuleac et al. [44] applied stacked neural network modelling to the electrolysis of wastes polluted with phenolic compounds, including phenol, 4-chlorophenol, 2,4-dichlorophenol, 2,4,6-trichlorophenol, 4-nitrophenol, and 2,4-dinitrophenol. In their work, various ANN types of artificial neural networks were aggregated in a stack whose output response was a weighted sum of the individual networks. A comparison between the tested methodologies indicated that utilizing stacked neural networks or the assembly of neural networks could obtain more minor validation errors of 5.8% and 4%, respectively, rather than a single optimal MLP neural network. The idea of stacked neural network modelling was also studied in another study [69].

The response surface methodology (RSM) and ANN models were compared in terms of their performance in the modelling of electrocoagulation processes [41, 61, 109]. Nourouzi et al. [61] employed a three layer ANN model to predict the removal of Reactive Black 5 dye by a sequential electrocoagulation-flocculation process. The results obtained using the ANN model were compared with the RSM and showed that both models are able to predict the process; the ANN has a slightly better performance than the RSM model (R2 = 0.9764 and 0.9446, respectively).

Within the scope of process optimization, by process control, Pinto et al. [110] applied an ANN feedforward controller to a hybrid system of electrocoagulation and organic coagulation for removing Reactive Blue 5G dye from textile effluent. The ANN-based controller could manipulate the current intensity and organic coagulant dosage to act upon a disturbance in the influent dye load. In the domain of controlling electro-Fenton processes using artificial neural networks, Yu et al. [88] studied textile wastewater treatment using online monitoring of dissolved oxygen (DO) and oxidation–reduction potential (ORP). Their research was in line with their previous efforts on using artificial neural networks to control the Fenton process, both in batch and continuous operation mode, for textile wastewater treatment [111, 112]. In their study, two feedforward back-propagation ANNs were used to predict the Fe2+ dosage requirement and COD removal efficiency. One ANN predicted the Fe2+ dose based on the following inputs: reaction time to reach the ORP valley (min), the time for DO rising point (min), the ORP value at the ORP valley (mV), and the desired COD removal efficiency (%), with a 4:8:1 configuration. Their efforts to demonstrate the ANN’s capability for EF process control was a step forward in the application of ANNs in wastewater treatment.

In the scope of utilizing artificial neural networks for process integration, Borges et al. [99] used the ANN approach to model an integrated electrodialysis and photochemical process for saline wastewater treatment. Two three layer feedforward artificial neural networks were put in series to model the photo-Fenton process. The first (4:4:1) neural network was responsible for modelling the output values of TOC/TOC0 as a function of the input parameters time, concentrations of NaCl, Fe2+, and H2O2. The output value of the first ANN was sent to the second neural network to calculate the reaction rate with input parameters TOC/TOC0, concentrations of NaCl, Fe2+, and H2O2. This model was used to design a plug flow reactor and to determine its volume (V), for different process conditions and TOC reaction rates. Their work using neural networks showed an essential step in understanding the behaviour of the integrated process.

SVR

Support vector machines (SVM), first presented by Vapnik [113], based on modern statistical machine learning techniques, have been widely applied to classification and regression problems thanks to their promising generalization performance [114]. SVM can be adopted for regression problems, thus called support vector regression (SVR). A description of the SVR algorithm and its parameters is presented in the SI file.

Curteanu et al. [19] applied two machine learning techniques (artificial neural networks and support vector machines) for the prediction of the performance of an electrooxidation method to decrease the organic compounds and remove micro-organisms from activated sludge effluent. It was reported that overall, the SVM outperformed the ANN models when comparing correlation coefficients. Farzin et al. [47] applied different approaches to data mining, including the least square support vector machine (LSSVM) used for electrochemical removal of Ciprofloxacin (CIP) as a model pollutant. LSSVM needs to solve quadratic programming with only equality constraints, or equivalently a linear system of equations, which makes it simpler and faster than SVM [115]. They showed that their tuned LSSVM model has superiority over other investigated algorithms for their problem. SVM was also used by Yuan et al. [46] for predicting the electrochemical degradation of substituted phenols by developing a quantitative structure–property relationship model. Their SVM model had a good predictive ability for the quantitative relationship between rate constants and the structure of substituted phenols with a performance of RMSE = 0.202 and R2 = 0.892.

ANFIS

ANFIS was introduced by Jang [116] as a hybrid technique of artificial intelligence that combines a Sugeno-type Fuzzy Inference System (FIS) and an artificial neural network. Details of the ANFIS can be found in the SI file.

In certain studies, ANFIS and RSM models have been compared to predict the removal efficiency and operating costs of the electrochemical processes [117, 118]. In both studies, ANFIS models showed comparable results with the RSM models. However, it was mentioned that RSM models were built with much fewer model parameters compared to the ANFIS models, which could lower the uncertainties of the model given the low number of data available [118]. ANFIS, along with ANN, has also been studied for the treatment of greywater using electrocoagulation by Nasr et al. [64]. Their ANFIS application performed an exhaustive search within the available inputs to determine the most influential input attribute in predicting the turbidity removal. It was indicated that current density is the most influential input on turbidity removal.

A comparison of ANFIS and other AI techniques was carried out by Farzin et al. [47] for the modelling of Ciprofloxacin electrochemical removal from wastewater. The interpolation method was used as an augmentation technique to increase the number of data samples in the dataset. To select the best AI model, TOPSIS was used considering the criteria as the consuming time of the AI model, MAE, RMSE, and R2. TOPSIS is one of the known multi-criteria decision-making (MCDM) methods and has been used for problems with different criteria and complicated decisions [119]. TOPSIS analysis showed that ANFIS performed better than ANN for both interpolated and original data, which was in accordance with some other studies [120, 121].

Metaheuristic algorithms

Metaheuristic algorithms are computational intelligence frameworks that are specifically employed for complex problem solving in optimization. Population based metaheuristic algorithms, mainly GA and PSO (details are provided in the SI file), have been utilized for the optimization of electrochemical processes for water and wastewater treatment. However, recently other nature-inspired algorithms like the fire fly optimization algorithm (FFA) have been utilized by researchers [47]. These optimization techniques have been applied for process output optimization and hyperparameter selection of AI models, especially in the case of ANN. Figure 4 represents optimization approaches for finding the optimal process conditions and the optimal hyperparameters of AI models.

Fig. 4
figure 4

Metaheuristic algorithms for: i) finding optimal process conditions, ii) optimization of hyperparameters of ANN models

The ANN-GA approach for electrooxidation process output optimization to find optimal conditions has been applied by some authors [43, 122, 123]. Picos, Peralta-Hernández [123] used this approach for the prediction of discoloration of a dye by an electrooxidation process in a press-type reactor. The ANN with performance MAPE = 8.3868% and RMSE = 7.5537% values was linked to GA optimization to find the best operational conditions, where the EO can reach a maximum discoloration at the lowest current density, flow rate, experimental time and at the highest dye concentration. They experimentally validated the ANN-GA result that about 95% discoloration can be obtained in an experimental time of 110 min, a flowrate of 12 Lps, a current density of 27.34 mA/cm2 and a dye concentration of about 230 mg/L. The same group studied the ANN-GA approach for the prediction of discoloration of Bromophenol blue dye for an electrooxidation process [43]. Mean discoloration efficiency of 88.8%, compared to 95.5% predicted by the model, could be obtained at the optimal conditions. Similar discoloration efficiencies were obtained, which proved that this AI model could be used as a helpful tool in the design, control and operation of similar EO processes to wastewaters with similar dyes.

In the scope of electrocoagulation process optimization, Taheri et al. [124] used ANN modelling and a GA algorithm to improve the Taguchi design optimization for the degradation of three different dyes, including Acid Orange 7, Acid Brown 14, and Acid Red 18 azo dyes by electrocoagulation. A GA was used for techno-economical optimization of the Taguchi design for dye removal. Their GA used the ANN model to search for the best conditions for removal efficiencies between the minimum and maximum levels of the Taguchi design. Their GA optimization results showed removal efficiencies of 96.79% and 76.74% for Acid Orange 7 and Acid Red 18, respectively, at nearly the same operating conditions. Their work illustrated the ANN and GA approach as a powerful tool for techno-economical optimization of selected dye removal using the EC process.

When there are multiple responses to consider, the problem shifts to a multi-objective optimization problem. There is no unique solution to a multi-objective optimization problem but a set of mathematically equally good solutions known as nondominated or Pareto optimal solutions. Bhatti et al. [125] used multi-objective optimization by genetic algorithms for electrocoagulation of copper from simulated wastewater. Their system was modelled by both RSM and ANN modelling approaches. Despite the limited experimental data, the 4:5:2 ANN model performed as well as the RSM (R2 = 0.993 for copper removal efficiency and R2 = 0.870 for energy consumption) to describe the nonlinearities of the electrocoagulation process, with MSE = 0.571 and combined regression coefficient of 0.982 for copper removal efficiency and energy consumption. A genetic algorithm linked to the ANN model was utilized to derive the Pareto front, which defines a set of optimum operating points with respect to removal efficiency and energy consumption. Their multi-objective optimization linked to the ANN model resulted in insight regarding the optimal operating conditions of the process. The idea of Pareto front was also applied by other researchers [126].

Multi-objective PSO algorithm has also been used for techno-economical optimization of combined electrocoagulation/coagulation’s performance in the removal of RB 19 from simulated wastewater using the ANFIS model [117]. Minimum and maximum values of 58.27% and 99.67% for RB 19 removal efficiencies were reported by the selected ANFIS model, respectively. The difference between the minimum and maximum dye removal efficiency levels for operating costs was 0.39 US$/m3.

Inside of the black-box models

Tuning AI model parameters

AI models have inherent hyperparameters that should be tuned so that the model can optimally solve the machine learning problem. These hyperparameters control the learning process and have a direct effect on the model performance. Figure 5 shows the hyperparameters of the AI models used to tune in the literature.

Fig. 5
figure 5

Hyperparameters of the AI models

The network configuration, i.e. the number of hidden layers and hidden neurons, has received the most attention [57, 58, 60,61,62,63, 65, 68]. In most of the studies, the coefficient of determination and MSE were chosen as criteria for network performance.

Valente et al. [57] studied the prediction of COD concentration in dairy industry effluent treated by electrocoagulation using artificial neural networks. In order to select an appropriate number of neurons in the hidden layer to prevent overfitting and loss of the network’s generalization ability, several ANN architectures were evaluated using MSE and correlation coefficient as performance parameters. A neural network with 9:10:1 configuration was selected with MSE = 0.00406 and R2 = 0.9560 for the test set. According to their results based on ANN simulation, the efficiency of the COD removal can be described as a function of time, pH, current density and distance between electrodes.

Single hidden layer networks with a trial and error procedure on the network configuration were utilized for correlating inputs to outputs [35, 36]. Ahmed Basha et al. [35] used ANNs for modelling the electrooxidation process applied to an effluent of a specialty chemical manufacturer which was highly loaded with organic matter (COD: 48,000 mgL−1 and BOD5: 1100 mgL−1). In their work, a single hidden layer network with 3:7:1 configuration led to a reasonable prediction of the COD removal efficiency, with R = 0.9977 and RMSE = 0.8378 (mean experimental value = 53.59). It was shown that an increase in the number of hidden neurons can enhance the performance of the three layered network but can have an adverse effect on the performance of the four layered network. The importance of the number of hidden layers and hidden neurons were also investigated in other studies [36].

The trial and error procedure was also applied by other authors to determine the optimum number of hidden layer neurons based on different error functions [37, 38, 40, 42]. Sangal et al. [37] developed a three-layer ANN model to predict the removal of CBSOL LE red wool dye from wastewater by electrooxidation. The optimal 3:8:3 ANN architecture could estimate the outputs with a correlation coefficient of 0.995, 0.996, 0.992, and 0.995 for training, validation, testing, and all data sets, respectively. It was reported that the proposed ANN could accurately simulate the outputs from given inputs.

Other than the number of hidden layers and hidden neurons in each layer which have been widely considered in ANN modelling, the initial weights are another important factor that affects the performance of the network. Choosing an improper set of initial weights can lead to local minima, which results in the bad performance of the network. This effect has been rarely considered in ANN modelling studies for water and wastewater treatment using electrochemical processes. In their two studies, Sadrzadeh et al. [97, 98] took this point into account by performing 20 runs using different random values of initial weights for each of their different structured networks based on a hidden layer and hidden neuron numbers. This approach can lead to reducing the uncertainties related to neural networks.

They also studied the effect of different transfer functions of hidden and output layers on the performance of the network. Transfer functions used as the neuron activation function to the sum of weighted inputs and biases are one of the neural network hyperparameters that can affect the network performance. A description of transfer functions is provided in the SI file. Piuleac et al. [44] illustrated that a transfer function combination for hidden and output layers performed better than the single transfer function for all layers. Their optimal network was then tested with real wastewater of a fine-chemicals plant and showed an average error of around 4.92% between experimental and predicted COD concentrations, which gave a very good illustration of using neural networks in the case of wastewater treatment. In another study, the same team showed that tansig transfer function for all hidden and output layers obtained the best performance for the electrolysis treatment of wastewater polluted by phenol compounds [43].

Still aiming to find optimal ANN structures da Silva Ribeiro et al. [65] studied an artificial neural network for the prediction of boron removal from mining wastewaters by electrocoagulation. Different types of transfer functions and network structures were examined in their study to observe their performance. The 3:10:1 network with a logsig transfer function in the hidden layer and a purelin transfer function in the output layer showed the best performance based on the correlation coefficient (R2) and the sum of squared error (SSE) with values of 0.973 and 0.616, respectively.

One of the most thorough studies on the effect of various network architectures and parameters on the modelling performance was performed by Hasani et al. [68] for the modelling of alternating pulse current electrocoagulation-flotation (APC-ECF) for humic acid (HA) removal. Their study focused on the effect of various network architectures and parameters (e.g., two different ANN architectures as MLP and generalized feedforward-GFF, number of hidden neurons, transfer functions, and learning parameters) on the modelling performance. Their extensive comparisons between different networks revealed that the single hidden-layer GFF NN (5:6:1), using sigmoid transfer function at both hidden and output layers and LM training algorithm, had the best performance with R2 = 0.999 and MSE = 0.00006. Their computational analysis proved that ANN-based modelling could effectively simulate the experimental data and predict the optimum conditions of the electrocoagulation/flotation process for the removal of HA from aqueous solutions.

As mentioned before, optimization techniques can be utilized to find the optimal configured network by searching in the hyperparameter space of the neural network [39, 47, 127]. Viana et al. [127] presented artificial neural networks and statistical analysis to predict and optimize the electrochemical degradation of the textile dye Reactive Black 5 using a \({\mathrm{Ti}/({\mathrm{RuO}}_{2})}_{0.8}-{{(\mathrm{Sb}}_{2}{\mathrm{O}}_{3})}_{0.2}\) in a batch treatment system. By using the PSO algorithm, they optimized their neural network model parameters, including hidden neuron number, transfer function, and learning rate. Their 4:8:3 neural network could successfully predict colour removal, COD removal, and energy consumption for the textile dye Reactive Black 5 degradation with a performance of \({\mathrm{R}}_{\mathrm{test}}^{2}\)=0.982, \({\mathrm{MSE}}_{\mathrm{test}}\)=0.0146.

In the scope of metaheuristic techniques, it is worth noting that different values of the GA control parameters can have significant impacts on the optimal results obtained. Piuleac et al. [128] studied an ANN-based optimization methodology in detail, including the impacts of the genetic algorithm parameters, to optimize an electrocoagulation process involving three different pollutants of kaolin, Eriochrome Black T solutions, and an oil/water emulsion. Time, current density and initial pH were considered as decision variables for the GA optimization alongside the size of the initial population, the number of generations, crossover rate, and mutation rate as GA control parameters. To observe the impacts of these GA control parameters, they conducted different series of optimizations with different values for these control parameters. Various scenarios with different sets of GA control parameters were developed in order to select the most convenient working conditions regarding the decision variables. The ANN-GA approach was found to be an efficient optimization method for their EC process and could predict the optimal conditions for maximum removal efficiency of the three pollutants with a maximum relative error of 11.46% and an average relative error of 6.61%.

Regularization techniques to prevent overfitting

The selection of an appropriate number of neurons in the hidden layer is a crucial task for MLP neural networks since too many neurons can cause the so-called over-fitting problem. In this case, the fitting error on the training set will be very low due to the very successful learning process, but the error on new data presented to the network is very high. The network has memorized the training data but has not exploited its generalization ability [57]. Regularly, to obtain good network generalization, the method is to propose a network which is large enough to provide an appropriate fit. Although it is difficult to have the perspective to know how large a network should be in each case, three generalization learning methods of cross-validation (early stopping), regularization, and pruning can be applied. Regularization is conducted by adding a penalty function to the training objective to minimize the complexity of the model and the prediction error at the same time; while pruning physically omits some excessive neurons to generate the least size network. For the cross-validation (early stopping) method, the data set will be split into three non-overlapping subsets. The training dataset is utilized for learning the network parameters, the validation dataset is utilized for monitoring the training process and for approximating the generalization error, and the test dataset, a set of data not seen by the model during training, is utilized for examining the unbiased generalization error of the trained network. In the early-stopping method, when the validation error rises over a number of iterations (due to over-fitting), the training algorithm stops, and the values of the weights and biases are returned to the point where the validation error was minimal [129, 130].

While early-stopping method have been used in most of the studies in the domain, recently, Gholami Shirkoohi et al. [41] applied the regularization method to their problem of modelling and optimization methodology for active chlorine production using the electrolysis process. Learning curves were used to diagnose whether there is a high bias (underfit) problem or a high variance (overfit) issue. In the presence of a high variance problem, using the regularization factor can help. Regularization makes slight modifications to the learning algorithm such that the model generalizes better and the model’s performance on unseen data is improved. They showed that utilizing learning curves along with regularization factor analysis can help to obtain reliable ANN models to predict the production of active chlorine and energy consumption using an electrolysis process.

Sensitivity analysis

The coefficients between artificial neurons that result from ANN training are comparable to the synaptic strengths between the axon and dendrites of a biological neuron in the brain. As in real life, these weights determine what fraction of the incoming signal will be transferred to the neuron's body [20]. Despite the fact that ANNs are black boxes, the neural connection weight matrix can be used to determine the relative importance of each input independent variable on the desired output. Garson [131] and Goh [132] proposed a method for partitioning the connection weights in order to determine the relative importance of the various inputs. Essentially, this strategy entails dividing each hidden neuron's hidden-output connection weights into components related to each input neuron [133].

Belkacem et al. [38] reported that neural network modelling could effectively forecast the electrooxidation of oxytetracycline (OTC) in a batch process using a platinized titanium anode. They showed that the reaction time has the most influence on the process output with a relative importance of 50.70% followed by the current intensity and the nature of the electrolyte, 15.24%, and 14%, respectively. For an electrocoagulation process, Aber et al. [58] modelled the removal of Cr(VI) from polluted solutions using artificial neural networks and the results showed that all input variables have significant effects on the removal of Cr(VI). In further work, Bui [63], applied artificial neural networks to predict dye removal efficiency (colour and COD) of electrocoagulation for a Sunfix Red S3B aqueous solution. A sensitivity analysis showed that the efficiency of the EC process is highly dependent on current density, electrolysis time and initial pH for colour removal, whereas it is highly dependent on initial dye concentration, sulphate concentration, and electrolysis time combined with the initial pH for COD removal.

For EF processes, one study showed that while all of the independent variables have a strong influence on the output, the initial pH is slightly more influential for the PEF/TiO2 process [84]. Conversely, time and current intensity were the two most important parameters for the phenol removal using the EF process [86]. These two parameters were also shown to be the most influential factors in an EF process for the treatment of composting plant leachate [87].

The relative importance of each input independent variable on the desired output obtained by the Garson algorithm can help ANN modelling approach to provide meaningful insights from the process, usually driven by a well-known RSM approach for experimental studies. Gholami Shirkoohi et al. [41] showed that electrolysis time and current intensity have about 81.5% influence on active chlorine production compared to an 82.8% influence in the factorial design analysis using RSM. The H3O+ and NaCl concentration represented the remaining 18.5% of the investigated response. They reported that their findings are similar to the RSM outcomes showing the compatibility and reliability of the ANN model results.

Conclusions and future perspectives

Based on the extensive literature reviewed, it was observed that artificial intelligence techniques have demonstrated their potential for modelling, performance prediction and optimization of electrochemical processes used for water and wastewater treatment processes. The following conclusions can be drawn from the literature reviewed:

  • AI techniques have been employed mainly in four electrochemical processes for water and wastewater treatment including electrooxidation, electrocoagulation, electro-Fenton, and electrodialysis. Since, data used for modelling and optimization of electrochemical processes for water and wastewater treatment processes are derived mainly from experimental studies, the majority of the research used AI methods on data sets with a small number of samples (150).

  • While the usage of AI models is becoming more prevalent in several scientific disciplines, including electrochemical processes, the reliability of the developed models is still critical, owing to the limited data available. Although data augmentation techniques have been used in machine learning in various disciplines, particularly image processing and speech recognition, they should be used with caution for the purpose of regression of experimental work with limited data. This is due to the fact that the behaviour of outputs in experimental studies might be far more complex than describing them using predetermined interpolation functions, which would clearly make it easy for the AI model to anticipate the behaviour of the new interpolated data. Therefore, it seems tuning AI model hyperparameters and use of regularization techniques to prevent overfitting problem would be the principal part to focus.

  • ANNs have been the prevalent technique to model electrochemical processes. This could be related to their inherent capabilities to discover patterns between inputs and outputs, even in complex nonlinear processes. Multilayer feedforward neural networks with back-propagation training were widely used in treatment applications.

  • Metaheuristic optimization algorithms have been applied for process output optimization and hyperparameter selection of AI models. For finding the optimal process conditions, both single-objective and multi-objective optimization approaches were utilized in the literature.

  • Despite the black-box nature of ANNs, there have been some efforts to interpret the process under study with the relative importance of each input independent variable on the desired output. This can be further developed by using AI models to show the main effects of each independent variable on the response of the system, which is usually represented by the RSM approach.

Though AI techniques are indicated to be a promising alternative to traditional linear and parametric, and phenomenological methods for modelling and optimization of the electrochemical processes used in water and wastewater treatment, there are still some areas requiring further research:

  1. (1)

    Tuning AI model parameters which control the learning process and have a direct effect on the model performance is a crucial aspect. In the case of ANNs, the selection of optimum network parameters such as number of hidden layers, number of neurons in hidden layers, learning rate, momentum factor, transfer functions, and learning algorithms are still major tasks in ANN modelling, and the usual way to overcome these difficulties is the trial and error method. There have just been a few studies so far to use optimization algorithms such as PSO to optimize the ANN model structure and parameters.

  2. (2)

    Most of the studies reviewed considered single neural networks for modelling and predicting the performance of their systems. The downside of this approach is that as neural networks are sensitive to the training data, they would find different sets of weights each time they are trained. This will lead to different predictions each time and to high variance. Ensemble modelling, which consists in training multiple models instead of a single model and combining them to find the predictions, is one of the proposed approaches to overcome this challenge. It can be conducted by:

    • single learning algorithm, different data sets;

    • single learning algorithm, different configurations options;

    • different algorithms.

      Only a few studies applied ensemble modelling approaches like stacked neural networks in the reviewed papers, but they showed good performance.

  3. (3)

    So far, most of the relevant studies have been performed by a conventional feedforward ANNs with the BP algorithm. However, with the advances in machine learning, the BP-MLP neural networks with regular activation functions and long training time would not be the best option. Further research is still required to apply different activation functions (e.g., rectified linear activation function or ReLU), different machine learning algorithms (e.g., SVM, decision trees) or different neural networks and variations (e.g., GRNN, ANFIS) for the modelling and optimization of electrochemical processes used in water and wastewater treatment processes.