Keywords

1 Introduction

The prediction of river water quality is no more out of the ordinary today especially in the field of hydrology and environmental science. Clean water is extremely important for all. Therefore, there is no living beings including animals, plants and humans can survive in this world without clean water. Besides drinking, numerous sectors of economy, viz. manufacturing and commercial, agriculture, hydroelectric power supply, fisheries and even animal husbandry depends on the clean river water supply. Thus it shows that water, particularly river water plays an important role (Tyagi et al. 2013).

As urbanization and population growth increased, it has caused the needs of fresh water increased to the very great extent over the past several decades (Al-Badaii et al. 2013). As stated by Abba et al. (2020), water pollution is known as the existence of toxic or harmful substances in water that may results in disadvantageous to living beings at a certain level (Abba et al. 2020). The chances of rivers to be polluted by heavy metals and other contaminants that results from human activities are high. Therefore, this has placed the river system at high risk due to the disadvantageous of environment pollution since the river can be effortless accessed for waste disposal and also because of the dynamic nature of the river itself (Ahmed et al. 2019).

These contaminants and pollutions occur in rivers have deteriorated the river water quality. There are two main factors that will affect the water quality which are the natural factors (hydrological, climate and geological factors) and the human factors (Sami et al. 2021). Human factors usually are the contaminants and pollutions that results from rapid urbanization, agriculture and livestock farming. Thus, a suitable measure need to be done in order to maintain the management of the river water quality from the river water pollution.

Artificial Intelligence (AI) have been widely used by most scientist and investigators from all over the world to predict the parameters of the river water quality. Lafdani et al. (2013) have stated that nowadays, the growth in AI gives a difference in prediction as an estimator used for hydrological phenomenon (Lafdani et al. 2013). When the hydrological data is introduced to the model, it able to learn or discover the system behaviour which gives the main advantage of the AI models.

There are abundance of AI modelling that have been developed to predict river water quality. Adaptive Neuro-fuzzy Inference System (ANFIS), Fuzzy Logic (FL), Support Vector Machine (SVM), Artificial Neural Network (ANN), Radial Basis Neural Network (RBNN) and Multilayer Perceptron (MLP) are the examples of AI models that able to be applied to predict time series related modelling based on historical data (Rizal 2020). ANFIS on the other hand is a machine learning that has a feed-forward multilayer neural network composed of fuzzy logic and ANN. In order to produce the input–output relationship with the nonlinear depiction, ANFIS uses ANN and a learning algorithm of fuzzy logic systems (Azad et al. 2019).

Previous study also showed great results when using ANFIS to predict water quality in their research. For example, Abba et al. (2017) have used ANN and ANFIS techniques in order to predict the concentration of dissolved oxygen (DO) in Yamuna River. However, it has shown that ANFIS outperformed ANN in performances. The authors have achieved satisfying results for ANFIS with the value of 0.94 and 0.7 for correlation coefficient (R) and root mean square error (RMSE), respectively, in the calibration phase and the value of 0.81 and 1.38 for R and RMSE, respectively, in the validation phase. Even though ANFIS is better than ANN, the authors concluded that it still can be applied in modelling the DO concentration in the river (Abba et al. 2017). Other previous research that conducted by Abba et al. (2019), Sonmez et al. (2018) and Ranković et al. (2012) also predicted river water quality only by using historical data and have achieved good results in their research (Abba et al. 2019; Sonmez et al. 2018; Ranković et al. 2012). In the current research, ANFIS models have been established for the prediction of six different water quality parameters in Langat River, Malaysia. In the next section, study area and the methods are explained. In Sect. 3, the results of the modellings are presented while the discussion is deliberated in Sect. 4. The conclusion of the study is detailed out in Sect. 5.

2 Study Area and Methods

Langat River, Malaysia is the study area chosen as shown in Fig. 1. The historical data (from 1981 to 2019) of the river water quality parameters have been retrieved from the water quality station (Station No. 2917601) at Department of Irrigation and Drainage (DID), Malaysia. Magnesium, pH, total solid (TS), conductivity, colour, ammonia, nitrate, turbidity, dissolved solid (DS), chloride, solids, alkalinity, fluoride, calcium, hardness, biochemical oxygen demand (BOD5day), chemicals, potassium, sodium, manganese, silica, iron, phosphate, total suspended solid (TSS) and sulphate were the parameters of water quality applied as inputs for the modelling. Moreover, only 161 data for all 25 parameters of the water quality were available due to the missing value. A basic statistics of the raw data of the parameters of the water quality in Langat River are shown in Table 1.

Fig. 1
figure 1

The study area

Table 1 Basic statistic of the raw data of the parameters of water quality in Langat River

The ANFIS models have been developed using Neuro-Fuzzy Designer app in MATLAB 2020b and all of the historical data have been cleaned and pre-processed to avoid errors while running the modelling. The missing values in the historical data have been replaced with a constant value which was a zero value. Furthermore, the historical data have been normalized in the range of [0,1] and have been divided into 3 sets of data namely, training data (70%), testing data (15%) and checking data (15%) sets. While developing the ANFIS models, subtraction clustering method was used due to large input data in the modelling. Backpropagation method and 100 epochs were chosen for the optimization method and the number of training epochs, respectively.

3 Results

In the current research, six models of ANFIS have been established to predict six different water quality parameters, viz. nitrate, phosphate, BOD, TS, DS and TSS. Table 2 shows the results of the ANFIS models according to the respective targets.

Table 2 The outcomes of the ANFIS models according to the respective targets

Based on Table 2, it has shown that ANFIS Model 5 has achieved the highest value of R2 with the value of 0.9712. It also obtained low values of RMSE which were 0.0028, 0.0144 and 0.0924 for training, testing and checking data sets, respectively. Compared to ANFIS Model 3 that has been used to predict DS, it has achieved the lowest value of R2 (0.4662). ANFIS Model 1, 2, 4 and 6 also obtained satisfying outcome by gaining good value of R2 which were 0.9501, 0.8903, 0.8342 and 0.7588, respectively. However, all of the ANFIS models have obtained low values of RMSE for all data sets. Figure 2 shows the regression plot of ANFIS Model 5 that has been used to predict nitrate.

Fig. 2
figure 2

The regression plot for ANFIS Model 5

4 Discussion

Based on the results in Sect. 3, it has shown that most of ANFIS models have achieved wonderful outcomes. Most of the models also obtained line of best fit in regression plots and achieved high accuracy in prediction. Past researches also have proven that using ANFIS to predict river water quality do obtained good outcome. Ranković et al. (2012) have used ANFIS method to predict dissolved oxygen (DO) in Gruža reservoir in Serbia. Eight parameters of water quality as inputs for the model have been used by the authors. The authors have achieved the respective values for the mean square error (MSE) and the mean absolute error (MAE) for the comparisons of measured and ANFIS predicted values of DO which were 0.6670 and 1.23 for test set while 1.0373 and 2.1831 for training and test set (Ranković et al. 2012). Meanwhile, Sonmez et al. (2018) have applied ANFIS for prediction of cadmium (Cd) concentrations in Fliyos River, Turkey. The performance parameters that have been used were R2, mean absolute deviation (MAD), mean absolute percentage error (MAPE), MSE and Nash–Sutcliffe efficiency. The results obtained shown that relatively higher correlation value, R2 = 0.91 was found between modelled and observed Cd concentrations. Thus, it indicated that ANFIS model gave good estimations with high degree robustness and accuracy for the concentrations of cadmium (Sonmez et al. 2018). Abba et al. (2019) on the other hand have used ANFIS, auto regressive integrated moving average (ARIMA) and ANN model in modelling WQI at Yamuna River, India and Kinta River, Malaysia. The result has shown that ANFIS-III with 5 input parameters, triangular membrane function and 2 membership function (5, trimf, 2) was the best model in predicting WQI for calibration and verification phase for both Kinta and Agra stations. Thus, ANFIS has proven to be the optimal performance in predicting the water quality index for both regions (Abba et al. 2019). Therefore, ANFIS models have been proven as an excellent approach to predict water quality parameters in rivers since it able to provide high accuracy prediction with low errors while developing the modelling.

5 Conclusion

Six ANFIS models have been developed successfully for prediction of six different parameters of water quality in Langat River. Five of the ANFIS models have achieved high value of R2 and low value for RMSE in all three data sets. ANFIS Model 3 is the only model that has obtained poor results and achieved low accuracy to predict the river water quality. For future study, the models can be upgraded to an advance predictive modellings to predict the water quality parameters.