Introduction

Surface water quality assurance is a major issue across the globe due to various man-made and natural deeds. Aquatic ecosystem mainly depends on the quality of water, but unfortunately, the aquatic life is being deteriorated due to water contamination (Mumtaz and Cheema 2023). The region of extensive aquaculture significantly impacts the nearby water bodies, causing both point and non-point sources of water quality degradation (Nagaraju et al. 2022). The release of wastewater from extensive aquaculture operations that contain extra nutrients, sediments, and pollutants into the environment is a diffuse source of pollution compared to other non-point sources (Nagaraju et al. 2022). These pollutants are widely distributed, making it difficult to identify individual polluted sites. Unused feed, excrement, and surplus nutrients from aquaculture operations can cause runoff that contaminates nearby water bodies and enriches the water with nutrients (Bauer 2023; Nagaraju et al. 2024). Furthermore, certain outlets, such as discharge pipes from aquaculture facilities, can be used to identify point sources of pollution in areas with intense aquaculture (Nagaraju et al. 2022). Water bodies may be directly affected by wastewater effluents high in nutrients, organic waste, and possibly antibiotics (Seeboonruang 2012). This can have a localized environmental impact. Targeted monitoring and mitigation initiatives are made possible by precisely identifying point sources. However, non-point source monitoring is a challenging task (Shen et al. 2020). Planning for sustainable water resources has grown increasingly difficult, especially in nations with large inland aquaculture zones and aquaculture processing sectors (Boyd et al. 2020). Effective water management is critical to aquaculture operations sustainability and circular economy (see Fig. 1). Proper water management is crucial for aquaculture to reduce waste and maximize resource efficiency.

Fig. 1
figure 1

Schematic view of circular economy in inland aquaculture water

The world’s second-largest producer of inland aquaculture, India, has been exporting more shrimp to other countries in recent years (Sahoo et al. 2023). The aquatic culture production has drastically improved from 13.4% recorded in 1990 to 49.2% in 2020 (Boyd et al. 2020). India has 14 major, 44 medium, and infinite number of minor rivers that contribute to the production of fish and other aquatic species. Almost 62% of total fish production is due to inland aquaculture practice (Sahoo et al. 2023). In the early days of aquaculture, various features posed negative impacts and challenges even though aquaculture is to enhance food security and employment (De Silva 2012). Aquatic food (43%) like fish, shrimp, crustaceans, and mollusks for human consumption is based on inland aquaculture only (Verdegem et al. 2023). The practice of induced carp breeding has tremendously augmented this sector into an emerging commercial industry (Trottet et al. 2022). The coastal inland aquaculture in India is classified into “Pokkali” fields in Kerala, “Bheries” in West Bengal, “Ghazanis” in Karnataka, and “Khazan” in Goa. Moreover, shrimp farming is done throughout the year in perineal fields (Edwards et al. 2019). However, the improper disposal of organic matter and nutrients from fish farming industries can lead to surface water contamination (Mehrim and Refaey 2023). Further, the natural aquatic habitation can be disturbed with the changing flow patterns and induction of non-native species. The major issues in coastal aquaculture can be categorized into post-harvest losses, overfishing, habitat degradation, climatic changes, and poor implementation of regulations (Mohanty et al. 2017). Seasonal changes generally impact to a large extent on aquatic species at both the local scale and global scale. The environmental changes bring psychological stress to the growth and behavior of fish and shrimp as well (Walker et al. 2019). The metabolic activity is good enough in pre-monsoon and is relatively poor in monsoon and post-monsoon seasons. The feed conversion ratio is very high in summer when compared to other seasons. Of course, the feeds for species and methodology employed for culture growth also varies with change in season to maximize the production (Islam and Chuenpagdee 2022). Increased aquaculture activity results in the annual outflow of lakhs of m3 of effluent water, which contains significant levels of sludge, pathogens, nitrogen, phosphorus, and biodegradable organic matter (Nagaraju et al. 2022). Nagaraju et al. (2022) reported the elevated ammonia levels in Andhra Pradesh’s western Godavari delta region to intense aquaculture, unlike other areas. Furthermore, point discharges of aquaculture effluents significantly impact surface water quality and diffuse inputs. Ammonia, phosphorus, calcium, organic matter, and suspended particles are all present in high concentrations in aquaculture effluents (Nagaraju et al. 2023a).

The intensity of aquaculture farming varies with the seasons, and there is a close correlation between this dynamic interaction and the variations in the surrounding water quality (Edwards 2015). The seasons that are considered favorable, which are marked by ideal weather, are crucial in determining how intense aquaculture operations are. Farmers frequently take advantage of favorable conditions to increase production and maximize yields during these times (Edwards 2015). On the other hand, unfavorable seasons could need modifying farming techniques to minimize possible difficulties. The seasonal fluctuations in aquaculture intensity are reflected in corresponding changes in water quality measures (Nagaraju et al. 2024). These variations highlight the need for season-specific management plans because they recognize the complex interactions between aquaculture methods and the seasonal changes in the environment. Despite the delta’s vital significance in Andhra Pradesh’s economy, unchecked operations have led to rising pollution levels, especially after the monsoon (Nagaraju et al. 2022), which poses a severe threat to the management of water resources and agricultural output. The Godavari Delta region’s ecosystem services have been misused, and eutrophication has resulted in low oxygen and high nutrient levels in the water (Nagaraju et al. 2022). So, sustainable management is essential to mitigate the long-term negative impacts on surface water quality in intensive aquaculture zones (Ahmad et al. 2021).

The conventional method of assessing water quality is by experimentation in laboratories and comparing it with the standards prescribed by the government agencies (Ewuzie et al. 2021). Unfortunately, the conventional methodology does not reveal the water quality in simple terms. For the purpose of timely water quality surveillance, many tools have come into existence. One of the most popular tools is water quality index (WQI) (Ighalo et al. 2021; Syeed et al. 2023). Horton introduced WQI for the first time in 1960. WQI enables the people, aqua culturists, and environmentalists to understand the significance of water quality (Aziz et al. 2021). The usage of WQI is widely grown across the globe because of its simple application and quality interpretation (Rathnayake 2015; Seifi et al. 2020; Makubura et al. 2022; Uddin et al. 2023a). Any WQI typically comprises four simple steps. Firstly, the significant parameters affecting water quality in the concerned study area are selected. The second step is to transform the parameters into dimensionless sub-index values. The third one is to decide the appropriate weightage factor, and finally, some aggregation function is used to develop a single WQI (Unda-Calvo et al. 2020; Sheikh Khozani et al. 2022; Patel et al. 2023a). More than 30 WQIs have been developed worldwide. All these WQIs are specific to region based on opinions of the experts and local agencies (Gupta and Gupta 2021; Uddin et al. 2021). Several researchers continued the work of Horton further to evaluate the water quality that could be more reliable to specific region (Evans et al. 2019). It is observed that there are several WQIs used globally, but there is no one common WQI used universally (Akhtar et al. 2021; Chathuranika et al. 2023; Siriwardhana et al. 2023).

WQIs are divided into two categories like relative indices and absolute indices. Relative indices focus on ecosystem, whereas absolute indices concentrate on water quality criteria (Pak et al. 2021). Weighted arithmetic WQI is one of the most popular absolute indices that is widely used for its simplicity and appropriateness for surface water quality assessment (Banda and Kumarasamy 2020; Zotou et al. 2020; Oladipo et al. 2021; Masood et al. 2022). There are huge uncertainties in the generation of WQIs, and these occur at different stages of model development (Uddin et al. 2022a). It is apparent that WQIs do not clear the gaps in the dataset. Machine learning (ML) is extensively used to forecast model ambiguity (Mohanta et al. 2020). ML algorithms predict the confidence interval applied to the dataset and reduce the model uncertainties (Shanbehzadeh et al. 2022). Several research scholars have used various ML techniques including random forest (RF), support vector machine (SVM), artificial neural network (ANN), multi-linear regression (MLR), and meta-heuristic algorithms (Heddam 2021; Shafiei et al. 2022; Uddin et al. 2022b; Kamyab et al. 2023; Uddin et al. 2023b). Midst these approaches, MLR is conventional method stating that the dependent parameter is predicted by independent variables. This method mainly concentrates on the linear relationships between the variables and influencing factors (Das et al. 2023). However, MLR does not concentrate on complex non-linear relationships. Huge research is done on the applicability of ANN and SVM that are strong enough for non-linear interpretations (Otchere et al. 2021). Further, ANN is more prone to overfitting the data with its training and testing abilities. But the usage of SVM involves the structural risk minimization principle overcoming the inadequacies in the dataset (Huang and Zhao 2018). The main idea of SVM is to define set of inputs as supports to draw vectors that separate the various classes in the studied data. The usage of SVM has its own advantage for solving both linear and non-linear problems (Azimi-Pour et al. 2020). The error in the data is reduced almost with SVM, whereas the error is reduced only during the training phase with ANN. With this added advantage, SVM is widely used by many researchers to solve complex datasets (Kurani et al. 2023). This research enables the seasonal scenario prediction of stream water quality by comparing mathematical and machine learning models in the context of intense aquaculture-influenced stream water quality.

Thus, this research aims to offer a thorough understanding of how several seasons affect stream water quality by examining the elements of critical components, spatial variations, prediction models, and plausible mechanisms. To quantify the stream water quality in the western Godavari delta region, 11 metrics served as the study site. This study has three goals: (1) to determine how the surrounding freshwater streams are affected by extensive aquaculture, (2) to specify how the quality of stream water varies seasonally, and (3) to create predictive models with implications for water resource planning that may anticipate stream water quality based on various seasonal fluctuations of intensive aquaculture zones.

Study area

The Godavari River in Southern India holds significant importance but faces contamination challenges from diverse sources. Spanning 1465 km, it ranks as the third-largest river in India, traversing major states, industrial hubs, and urban centers along its course before merging into the Bay of Bengal. The focus of this study is the western delta of the Godavari River, positioned in the tropical region between latitudes 16°15′00″ to 17°30′00″ north and longitudes 80°50′00″ to 81°55′00″ East (see Fig. 2).

Fig. 2
figure 2

Western Godavari delta region and water sample locations

Situated in the southern part of Andhra Pradesh’s West Godavari district, this area is vibrant with agriculture, aquaculture, and industrial activities, relying primarily on the Godavari River for water supply (Nagaraju et al. 2022, 2024). Bounded by the Godavari River to the east, Eluru Stream to the north, and the Upputeru River and Kolleru Lake to the west, the western Godavari delta has been transformed into perennial crop fields through the construction of an anicut at Dowlaiswaram (refer to Fig. 1). Table 1 shows the streams information. The region’s water depths range from 1.5 to 4.5 m, with an average annual precipitation of 1035 mm. Agriculture and aquaculture are the primary economic activities, supporting around 523 settlements. However, the rapid conversion of agricultural fields into aquaculture ponds has led to a decline in stream water quality, posing a significant threat to the local ecosystem. The region historically practiced intensive shrimp aquaculture, initially with tiger shrimp and later transitioning to the white leg shrimp (Vannamei) in the 2000s (Mantena et al. 2023a; Nagaraju et al. 2023b). Based on the previous study reported by Nagaraju et al. (2022), the study area for aquaculture farming is divided into three distinct zones: Zone-I (traditional farming), Zone-II (semi-intensive farming), and Zone-III (intensive farming), as illustrated in Fig. 2. Zone-I employs traditional methods with low stocking densities and minimal inputs. Zone-II incorporates semi-intensive techniques with moderate stocking densities and increased inputs, while Zone-III utilizes intensive farming practices with high stocking densities and significant inputs to maximize production. This shift led to severe intensity and the discharge of elevated levels of ammonia, nitrogen, and phosphorus into the mainstream of the western Godavari delta catchment (Nagaraju et al. 2022, 2024). The resultant high ammonia concentration contributed to stream nitrification, causing low dissolved oxygen levels, particularly in the summer when abundant phytoplankton further strained water quality, violating prescribed limits for minimum dissolved oxygen and maximum pH values (Mantena et al. 2023b; Nagaraju et al. 2022, 2024).

Table 1 Western Godavari delta region streams information

In the study area, farmers practice shrimp cultivation at varying intensity levels, determined by shrimp density per acre. The cultivation period is adjusted to meet market demands (Nagaraju et al. 2022). Typically, after 2 months in the grow-out pond, shrimp reach a marketable size of 20–25 g with good survival rates. In these intensive farming ponds, a 3-month period is sufficient for one successful crop, allowing for up to three crops per year. However, it is important to note that high stocking density and intensity are major contributors to water contamination in ponds. Therefore, it is crucial to strike a balance between productivity and environmental sustainability. Average water use per crop at different intensity levels was 3.35 × 104 m3/ha. Increased water exchange volumes correlate with higher total water usage. Most annual rainfall occurs during the monsoon season (mid-June to mid-October), impacting hydrology. Water loss through exchange was approximately 0.85 m3 per kg of shrimp produced, with post-harvest effluent discharged into nearby streams.

Methodology

In this study, modified WQI was used to infer the status of water quality in four streams with 78 sampling stations that form principal sources of supply for intensive inland aquaculture zone in western Godavari delta (see Fig. 2). Furthermore, numerical and ML approaches were used in the study to evaluate water quality results in a comprehensive mode.

Data preparation

Water samples were collected from 78 sampling stations across four streams: Eluru stream (17 stations), Gostanadi stream (12 stations), Narsapuram stream (24 stations), and Venkayya Vayerru stream (25 stations). These samples were gathered during the pre-monsoon (February to May), monsoon (June to October), and post-monsoon (November to January) seasons from 2021 to 2023 in the respective areas. Moreover, controlled discrete sampling was conducted during morning hours (7 a.m. to 8 a.m.), this standardization is crucial for obtaining comparable and reliable data across different sampling days and locations. For instance, the timing of water sampling is a critical factor that can significantly influence the accuracy and reliability of water quality data, particularly in aquaculture zones. Critical parameters such as pH, dissolved oxygen (DO), and biochemical oxygen demand (BOD) levels exhibit diurnal fluctuations closely tied to the time of day when samples are collected. These fluctuations are primarily driven by the photosynthetic activity of algae and other aquatic plants, which is influenced by sunlight and nutrient availability. The collection and analysis of water samples were performed according to drinking water specification IS 10500, BIS 2012. Water samples were collected in 2 L water bottles from a depth of one foot at each sampling station along the canal. All samples were stowed in polyethylene plastic bottles prudently and were transported to the laboratory and analyzed within 5 days. A total of 11 parameters referring physical and chemical characteristics were evaluated based on prescribed standards. pH, total dissolved solids (TDS), electrical conductivity (EC), hardness, calcium, magnesium, chlorides, DO, BOD, alkalinity, and nitrates were analyzed according to American Public Health Agency (APHA) standard procedures (Sharma et al. 2021). The WQIs are simplified depictions of intricate realities or water quality models in which variables are selected, and weighting and aggregation techniques are specified (Patel et al. 2023b). The weighted arithmetic water quality index (WAWQI) technique evaluated the water quality based on the degree of purity using the most measured water quality parameters (Patel et al. 2023a; Sinha 2023). It has been widely used in surface water research and most studies on different conditions, including groundwater studies (Patel et al. 2023b). Eleven water quality parameters were analyzed, as mentioned earlier.

WAWQI is calculated as per the standards of Indian Council for Medical Research (ICMR) and BIS. In this technique, the aggregation of quality index and weightage factor is done to obtain the quality. WAWQI is calculated as given in the following equation:

$$WAWQI = \sum (Qr. Wi) / Wi$$
(1)

The quality rating for each attribute is obtained as follows:

$$Qr = 100 * (Vni - Vio) / (Sn - Vio)$$
(2)

where Vni is experimental parameter value, Vio is ideal value of parameter as per BIS and ICMR standards, Sn is the standard value for each attribute, and Wi is recommended weightage factor for each attribute. The value is zero for all the parameters except for pH and DO. The value of Vio is 7.0 for pH and 14.6 for DO consistently. For predicting WQI, the following input variables are considered pH (X1), EC (X2), TDS (X3), alkalinity (X4), hardness (X5), Ca (X6), Mg (X7), Cl (X8), NO3 (X9), DO (X10), and BOD (X11).

Gaussian elimination model (GEM)

In this method, a linear matrix is formed by row reduction technique. For 11 input attributes, 11 equations are formed. Pivoting operations and row echelon reduction is done using mathematical operations (Beauregard 2007). The variables’ coefficients and results can be systematically grouped in an augmented matrix, one way to organize the dataset (Kaparthi and Suresh 1994; Wu et al. 2014). As previously stated, the augmented matrix was reduced to its row-echelon form by applying elementary row operations to a dataset spanning each season. Additionally, the augmented matrix is in reduced row-echelon form, and each variable is solved beginning with the final row in the back-substitution step. Loge values were computed from the experimental results. Eleven sets of linear equations are created since there are 11 independent variables and one dependent variable, as Eqs. (3) to (5) illustrate. The coefficients of the linear equations that best suit the provided dataset are represented by the values to be placed into matrix, the solutions found via Gaussian elimination.

$$\sum \text{WQI }=\text{ a}\sum \text{X}1+\text{b}\sum \text{X}2+\text{c}\sum \text{X}3+\text{d}\sum \text{X}4+\text{e}\sum \text{X}5+\text{f}\sum \text{X}6+\text{g}\sum \text{X}7+\text{h}\sum \text{X}8+\text{I}\sum \text{X}9+\text{j}\sum \text{X}10+\text{k}\sum \text{X}11$$
(3)
$$\begin{array}{c}\sum \text{WQI}.\text{ X}2=\text{a}\sum \text{X}1.\text{X}2+\text{b}\sum \text{X}{2}^{2}+\text{c}\sum \text{X}3.\text{X}2+\text{d}\sum \text{X}4.\text{X}2+\text{e}\sum \text{X}5.\text{X}2+\text{f}\sum \text{X}6.\text{X}2+\\ \text{g}\sum \text{X}7.\text{X}2+\text{h}\sum \text{X}8.\text{X}2+\text{i}\sum \text{X}9.\text{X}2+\text{j}\sum \text{X}10.\text{X}2+\text{k}\sum \text{X}11.\text{X}2\end{array}$$
(4)
$$\begin{array}{c}\sum \text{WQI}.\text{X}3 =\text{ a}\sum \text{X}1.\text{X}3 +\text{ b}\sum \text{X}2.\text{X}3 +\text{ c}\sum {\text{X}3}^{2} +\text{ d}\sum \text{X}4.\text{X}3 +\text{ e}\sum \text{X}5.\text{X}3 +\text{ f}\sum \text{X}6.\text{X}3 +\text{ g}\sum \text{X}7.\text{X}3 +\\ \text{h}\sum \text{X}8.\text{X}3 +\text{ i}\sum \text{X}9.\text{X}3 +\text{ j}\sum \text{X}10.\text{X}3 +\text{ k}\sum \text{X}11.\text{X}3\end{array}$$
(5)

The other eight equations are formed in a similar fashion. Figure 3 shows the methodology of GEM.

Fig. 3
figure 3

Flowchart of GEM methodology

GEM only naturally manages complicated relationships among non-linear input variables because it is meant to handle linear systems. It can nevertheless be a component of a larger methodological framework for estimating solutions in non-linear systems. Furthermore, GEM does not apply to big-scale situations (Golabian et al. 2022). In this study, understanding the potential of GEM in WAWQI prediction was assessed and explained in detail in the subsequent results section.

Multi linear regression (MLR)

Multi-variate modeling, or MLR, is a useful research technique that predicts correlations between input and output variables without going into depth about the reasons behind those correlations. MLR can be used with a few popular techniques, including logarithmic, exponential, power, and linear functions. According to earlier studies, the most popular approach for evaluating the site quality index was multi-parameter maximum likelihood with only one parameter. However, MLR had successfully analyzed a multiparameter dataset and created a superior best-fit model for WAWQI and stronger precision than a single parameter. This study adopted MLR to support the relationship between the WAWQI and the water quality measures. Without considering all 11 characteristics recommended by the standard WAWQI technique, significant parameters were additionally considered to predict WAWQI. The following is the MLR model equation:

$$\begin{array}{c}MLRWQI=Co+C1.X1+C2.X2+C3.X3+C4.X4+C5.X5+C6.X6+C7.X7\\ +C8.X8+C9.X9+C10.X10+C11.X11\end{array}$$
(6)

where MLRWQI is the output, input variables of X1, X2, X3, …, X11, and regression coefficients of Co, C1, C2, …, C11.

Support vector regression (SVR)

The SVR extension of SVM has been effectively applied to regression on numerous engineering problems, such as the prediction of the water quality index. With the provided learning dataset as S = {(m1, n1), (m2, n2), …, (mX, nX)}, where mi Ɛ Lx is a vector of input variables, and ni Ɛ Lx is the corresponding scalar output value, the basic objective is to fit a regression function accurately, y = f(x) in an SVR model to accurately predict the targets, {ni}, corresponding to a set of input samples, {mi}. The following linear framework can be built in the high-dimensional feature space for nonlinear real-world issues where the desired output cannot be linearly associated with the input data using a nonlinear mapping function g(m) as provided by the following equation.

$$f\left(m+\omega \right)={\sum }_{i=1}^{x}\omega i\text{g}\left(\text{m}\right)+\text{c}$$
(7)

where mi and g(m) are the functions referred to as features. In the feature space f, c is the bias and g(m) is the dot product. It operates under the fundamental principle of minimizing structural risk. The risk function can be minimized while estimating the coefficients.

Seasonal fluctuations of aquaculture zone waters

Water resource pollution poses a significant challenge in the inland aquaculture zones of coastal India, impacting both personal health and the environment. This study focused on the western Godavari delta region, where contaminated waters are a primary concern. Seasonal variations in physicochemical parameters, such as pH, temperature, electrical conductivity, and dissolved oxygen, were investigated in water samples from four streams to assess the impact. Sampling occurred during pre-monsoon, monsoon, and post-monsoon periods, capturing the dynamic changes in water quality across seasons. The results revealed distinct seasonal variations in the physicochemical composition of the water. Eleven key parameters were analyzed, including pH, EC, TDS, alkalinity, hardness, Ca, Mg, Cl, nitrate, DO, and BOD. This comprehensive analysis provides crucial insights into the present water conditions, serving as valuable data for future assessments and aiding in developing strategies for ensuring safe drinking water accessibility in developing countries. Seasonal changes in water bodies are influenced by precipitation and temperature, but aquafarming intensity also emerged as a significant factor in the studied area. Reducing river flows during the summer (pre-monsoon) would lead to more extensive and severe contamination. This would consequently reduce the areas available for plantation, crop losses, decrease yields, and have severe socio-economic and environmental impacts locally, along with significant negative consequences for the national economy (Nagaraju et al. 2022). Water quality is also affected by streamflow volumes, impacting both concentrations and total loads of pollutants. In this study area, aquaculture catchments dominated all four streams, exhibiting changes in stream water quality related to eutrophication and nutrient transport, which are highly dependent on lower stream flows. Markogianni et al. (2017) suggest nutrient loadings to coastal delta regions would vary more with extensive aquaculture farming than with streamflow volume. Extended drought periods in pre-monsoon seasons noticeably affect water quality, as reduced stream flow can lead to higher peak concentrations of contaminants. Table 2 shows the statistical data of the physicochemical parameters for various seasons.

Table 2 Statistical data of water quality parameters with varying seasons

Sustaining aquatic life, including plants and animals, requires maintaining ideal pH levels. Hydrogen ion concentration, or pH, measures how acidic or basic a solution is. It is essential for the biochemical and physicochemical processes in aquatic environments. The pH of the stream waters under examination is carefully controlled to fall between 6.74 and 10.60, with an average value of 8.62 during pre-monsoon, suggesting an alkaline environment. Seasonal fluctuations in pH are noted, with post-monsoon oxygen levels and slower metabolic rates in colder water temperatures. Stream water’s alkalinity, a gauge of natural salts, is affected by cations such as Ca and Mg in combination with CO3 and HCO3, or sometimes as OH. Seasonal variations in alkalinity are seen; it is higher during the monsoon season and lower during the winter, which is crucial for the buffering capability of streams (Panikkar et al. 2022). This parameter contributes to the creation of aquatic ecosystems by supporting phytoplankton.

Common contaminants like chloride can be found with other elements like calcium, magnesium, or sodium to produce salts like KCl, CaCl2, and NaCl. Rainfall and atmospheric conditions can affect its quantities, with larger levels found in aquaculture effluents drainage. High chloride levels, which can reach 595 mg/L in the pre-monsoon, are positively correlated with suspended particles and pH, affecting the production/respiration ratio in aquatic systems. Stream water hardness, which is caused by the sulfates and chlorides of Ca and Mg, varies, especially in the pre-monsoon season. Hardness results show dynamic variations influenced by seasons and salt concentration, falling below WHO standards.

Nitrate levels in the pre-monsoon exhibits serious concern. In aquatic systems, nitrite (NO2) is a less frequent but important type of nitrogenous waste that is usually produced when ammonia is not completely oxidized. Since nitrite poisoning disrupts the blood’s ability to carry oxygen, it poses a significant risk to aquatic life. This occurs due to nitrite-oxidizing haemoglobin to methaemoglobin, which cannot carry oxygen and causes methemoglobinemia (Ciji and Akhtar 2020). Environment with low water quality or insufficient biological filtration typically have higher nitrite levels, which worsen nitrite toxicity. Conversely, nitrate is less toxic than nitrite or ammonia. It is the last result of the nitrification process, in which nitrifying bacteria convert ammonia to nitrite and nitrate (Ciji and Akhtar 2020). High nitrate levels can still harm aquatic life, though, as they can cause long-term stress, stunted growth, and compromised immune systems. Elevated nitrate concentrations can also lead to eutrophication, a condition that lowers oxygen concentrations in water and creates dead zones uninhabitable by aquatic organisms. Since there is little water exchange in inland aquaculture with high stocking densities and nitrate can build up over time, nitrate toxicity is typically more of a worry.

DO, which influences the performance of aquatic organisms, is an essential water quality indicator. The DO levels within the streams were measured within the 1.8–7.6 mg/L range. Notably, the lowest levels of DO were consistently observed during the pre-monsoon period, whereas the highest levels were recorded post-monsoon. The diminished DO levels during the pre-monsoon phase indicate poor water quality, attributing this decline to the presence of phytoplankton in the streams. Consequently, the water was deemed unsuitable for drinking purposes. Conversely, the concentration of DO reaches its peak across all locations during the post-monsoon period. This observation aligns with the fact that cold water, prevalent during this season, can hold a higher amount of dissolved oxygen compared to the pre-monsoon period, contributing to improved water quality. DO concentrations peak during the monsoon season; they fall in the pre-monsoon and may affect the aquatic ecosystem. DO concentrations are also influenced by seasonal variations and the intensity of aquaculture farming. The difference between the initial and five-day DO content is reflected in pond water’s 5-day BOD. The amounts of BOD, which range from 1.2 to 13.5 mg/L, drop as intensity of surrounding aquaculture decreases. Seasonal changes in BOD concentrations suggest pollution from aquaculture effluents. Previously, in the study area, Nagaraju et al. (2024) highlighted a significant environmental concern in their recent study, pointing out that water bodies face organic pollution attributed to elevated levels of BOD. The research underlined that the observed high BOD values were particularly prevalent during the pre-monsoon period. This temporal association was attributed to the heightened metabolic activity of both aerobic and anaerobic microorganisms, which intensifies with elevated temperatures. Additionally, inland aquaculture ponds’ lack of water flow during this period exacerbated the situation. Similarly, the intensified concentration of aquaculture effluents discharged from the affected crops into nearby streams had a pronounced impact. This effect was particularly notable during the pre-monsoon period when all streams in the delta region experienced a notable reduction or absence of flow.

Gaussian elimination model–based WQI

The approach to predicting WQI is the GEM model using linear algebra to discrete the WQI equation. The matrix uses Gauss-Jordan Elimination calculator (Atasoy et al. 2012), the calculations are done, and final model values are generated. Therefore, the final model coefficients generated in the model are illustrated in the following equations for pre-monsoon, monsoon, and post-monsoon, respectively.

$$\begin{array}{c}GEMWQ{I}_{\text{pre}-\text{monsoon}}= -2.1214.X1+2.7279.X2-2.1419.X3+0.1309.X4\\ + 2.125.X5-1.035.X6-1.321.X7+0.289.X8-0.2368.X9 -0.186.X10\\ - 0.011.BOD\end{array}$$
(8)
$$\begin{array}{c}GEMWQ{I}_{\text{monsoon}}= -0.0014.X1-1.0235.X2+1.0392.X3+0.0004.X4\\ + 2.4356.X5-1.085.X6-0.946.X7+0.041.X8-0.033.X9 -0.729.X10\\ - 0.0003.X11\end{array}$$
(9)
$$\begin{array}{c}GEM{WQI}_{\text{post}-\text{monsoon}}= 0.903.X1+0.184.X2+0.227.X3-0.0009.X4-0.0014.X5\\ + 0.027.X6+0.009.X7-0.0044.X8-0.005.X9 -0.043.X10\\ + 0.221.X11\end{array}$$
(10)

The obtained GEM equations for three different seasons can be clarified through the data used for training GEM model for WQI. The derived values attained from the aforementioned equations establish an appropriate accuracy when compared to WAWQI approach. The inconsistencies between the model and measured values are comparatively insignificant. Figure 4 shows that there is no substantial eclipsing between GEM values and calculated WQI values with an average R2 value of 0.97. This model in the study comes with discrete merits. It is simplest method that suits for categorizing the variability of independent variables with a dependent variable. However, the model shows deficiencies when applied to datasets with limited input variables, even though it provides suitable estimations of WQI for other values. Therefore, in this study, the results of GEM-based models showed only the predictions of WQI with all 11 input variables.

Fig. 4
figure 4

GEM analysis results showing actual and predicted WQI

MLR and SVR model–based WQI

In this study, two machine learning techniques, MLR and SVR, were used. Both regression approaches generate WQI predictions for all three seasonal scenarios. The accuracy of forecasts is evaluated based on the MAPE and R2. Figure 5 shows that the prediction plot (red color line) overlaps the plot of the actual values (blue color); it represents the excellent relationship between predictions and observations and is nearer to 100% accurate.

Fig. 5
figure 5

Actual and predicted sets using MLR: a pre-monsoon, b monsoon, and c post-monsoon

The equations for various scenarios based on the coefficients of MLR analyses are presented in the following equations for pre-monsoon, monsoon, and post-monsoon, respectively.

$$\begin{array}{c}MLRWQ{I}_{\text{pre}-\text{monsoon}}= -26.6190+9.4354.X1+0.0819.X2+0.0021.X3\\ + 0.0085.X4-0.0012.X5+0.0269.X6+0.1137.X7+0.0021.X8\\ + 0.0825.X9 -2.6882.X10+4.9560.BOD\end{array}$$
(11)
$$\begin{array}{c}MLRWQ{I}_{\text{monsoon}}= -30.4961+9.7687X1+0.0827.X2+0.0004.X3+0.0086.X4\\ + 0.0013.X5+0.0223.X6+0.1362.X7+0.0019.X8+0.0613.X9-2.5951.X10\\ + 4.5897.X11\end{array}$$
(12)
$$\begin{array}{c}MLR{WQI}_{\text{post}-\text{monsoon}}=-31.03+9.7103.X1+0.0890.X2+0.0082.X3+0.0175.X4\\ + 0.0087.X5+0.0282.X6+0.1407.X7+0.0066.X8+0.0722.X9 -2.7004.X10\\ + 5.0363.X11\end{array}$$
(13)

The implementation of the above equation can be clarified using a numerical example using the data applied in the MLR model training for WQI. The predicted value obtained through the equations above for various seasons is relatively accurate compared to the value measured through the WAWQI approach. These differences in values are relatively minor.

Developing SVR-based models shares similarities with the development of tree or forest-based models, involving numerous trials to determine the optimal values of user-defined parameters. The agreement plot in Fig. 6 illustrates the convergence of almost all predicted values using SVR-based models to the actual values. In Table 3, the performance assessment indices reveal that MLR and SVR consistently exhibit an R2 value of 0.99 across various scenarios.

Fig. 6
figure 6

Actual and predicted sets using SVR: a pre-monsoon, b monsoon, and c post-monsoon

Table 3 Performance metrices of MLR and SVR models

Each model possesses specific advantages; MLR is valued for its interpretability and computational efficiency, suitable for scenarios with reasonably linear relationships between multiple independent variables and a dependent variable. Conversely, SVR excels in capturing non-linear relationships and demonstrating robustness against outliers by relying on support vectors. In the WQI prediction, both models show similar forecasts due to WQI being the measured parameter, offering viable alternatives to the mathematical Gaussian elimination method. The subsequent section explains the choice between regular WAWQI, GEM-based WQI, and machine learning approaches like MLR in complex inland aquaculture scenarios.

Impact of input parameters and importance of ML models in WQI

This section highlights the importance of machine learning models in the field of water quality and management in various scenarios. In this study, we have studied MLR and SVR by considering the most effective water quality parameters in the inland aquaculture zone to predict WQI. Combining crucial parameters into a single number allows the index to be easily interpreted, making it a valuable tool for stream management. When the dissolved oxygen content in the water falls too low, prawns will drown. Fish and other inhabitants of aquatic ecosystems depend on oxygen to survive. Furthermore, low dissolved oxygen concentrations in water are a harbinger of potential contamination and a crucial factor to consider when evaluating the water quality. Farmers frequently employed artificial dissolved oxygen powders in intensive aquaculture methods to maintain desired DO levels (Nagaraju et al. 2022). This index comprises other characteristics like pH, EC, TDS, hardness, and magnesium. TDS measures the quantity of organic and inorganic salts. Water loses its capacity to sustain various aquatic species when it gets too murky. Aquaculture waters may contain hardness and magnesium due to carbonates and bicarbonates. Finally, the excess feed, especially the protein content, significantly impacts the BOD levels of aquaculture water. Protein content affects the total nitrogen and total phosphorus levels overall. Elevated concentrations of these nutrients in the water can result from excess feed in aquaculture systems (Nagaraju et al. 2022, 2024). The BOD is impacted by the breakdown of organic materials and uneaten feed, which uses dissolved oxygen during microbial processes (Reid et al. 2019; Nagaraju et al. 2022, 2024).

The significant input variables reported from earlier studies (Nagaraju et al. 2022, 2024) are pH, EC, TDS, hardness, Mg, and BOD. This study generated two prediction models with different input variables to predict WQI. Firstly, easily measurable parameters such as pH, EC, TDS, hardness, and Mg were considered to predict WQI. The equations for various scenarios based on the coefficients of MLR analyses with significant five input variables are presented in the following equations for pre-monsoon, post-monsoon, and monsoon, respectively.

$$\begin{array}{c}MLRi{WQI}_{\text{pre}-\text{monsoon}}= -16.556+10.028.X1+0.1286.X2-0.0462.X3\\ + 0.0032.X5-0.4011.X7\end{array}$$
(14)
$$\begin{array}{c}MLRiWQ{I}_{\text{post}-\text{monsoon}}= -38.3030+ 11.3310X1+0.0978.X2-0.0253.X3\\ - 0.0034.X5+0.2938.X7\end{array}$$
(15)
$$\begin{array}{c}MLRiWQ{I}_{\text{monsoon}}= -14.354+7.4978.X1+0.0891.X2-0.0082.X3\\ + 0.0735.X5+0.1371.X7\end{array}$$
(16)

Figures 7 and 8 show the convergence of actual and predicted values using MLR and SVR-based models. In Table 4, the performance assessment indices reveal that MLR and SVR exhibit an R2 value of more than 0.85 and lowest average MAPE in all scenarios, indicating good performance (Nguyen et al. 2023). This is a solid scientific result and an essential finding for using machine learning models to calculate WQI with a dataset with limited variables, particularly in difficult situations when monitoring water quality measurements.

Fig. 7
figure 7

Actual and predicted sets using MLR with limited inputs: a pre-monsoon, b monsoon, and c post-monsoon

Fig. 8
figure 8

Actual and predicted sets using SVR with limited inputs: a pre-monsoon, b monsoon, and c post-monsoon

Table 4 Performance metrices of MLR and SVR models with significant input parameters

It is critical to keep all environmental parameters in aquaculture within ideal bounds. This minimizes stress caused by individual traits and reduces the possibility of compound stress, which arises when two or more elements are not functioning at their best. For instance, the combined adverse effects of pH and low temperature are more severe for shrimp than for either alone. Fish are given low amounts of Aeromonads species hydrophilic intraperitoneal and are exposed to ideal concentrations of ammonia and dissolved oxygen. Compared to the incorporated control group maintained in water containing the ideal concentrations of these factors, this led to a lower survival rate. Furthermore, while other factors remained optimum, combining the two variables at their optimal concentrations resulted in a shorter lifespan than optimal levels of any one variable.

Figure 9 shows the prediction results of both MLR and SVR when the input variables considered BOD along with the EC, TDS, hardness, and Mg. It is clearly demonstrating that BOD when considered as the input variable replacing pH shows significant improvement of the prediction accuracy. When considering the components of the eclipse effect, the more compromised water quality that WQI presents justifies its absence. It offers a value that is more responsive to BOD. According to WQI, this deterioration highlights the need for more research into the effects of this kind of activity on a community. Therefore, by using specialized techniques to assess the amount of feed in each pond and continuously monitor BOD, the aquaculture and surrounding waters may be better managed.

Fig. 9
figure 9

Actual and predicted sets in pre-monsoon: a MLR model and b SVR model

The SVR model’s predicted WQI values show a high degree of agreement with the observed values, proving the model’s predictive power. From performance metrics comparison, SVR (average MAPE of 1.74 and R2 of 0.98) is a better option than the MLR (average MAPE of 1.97 and R2 of 0.96) model. The SVR models convincingly align with water quality with limited variables data, offering a robust alternative. While MLR depends on the simple least square method, the SVR model is less sensitive to outliers and well in high-dimensional environments. The efficiency and representativeness of SVR in forecasting WQI values are highlighted in this work, underscoring its potential superiority in capturing complicated relationships within the data despite the MLR technique’s practical advantages in simplicity and reduced processing time.

This research highlights the apparent superiority of ML models, particularly MLR and SVR, over the conventional GEM in terms of building simplified equations with appreciably improved performance. The models generated by MLR and SVR have superior prediction accuracy, demonstrating their ability to capture complex relationships in the dataset. This contrasts with GEM, which relies on the assumption of linearity in interactions. When the most critical parameters found by machine learning algorithms are prioritized, the superior performance of the ML models is significantly highlighted. This focused strategy improves prediction abilities while also streamlining the model.

Conversely, GEM might require assistance with complex interactions in real-world datasets, even if it is essential in linear algebra. This could lead to simplistic models with lower predictive accuracy. The effectiveness of machine learning models is especially noticeable in real-time scenarios, providing a quicker and more accurate substitute for the time-consuming use of Gaussian Elimination. Because machine learning models are flexible enough to accommodate a wide range of data patterns, they are an excellent tool for creating simplified equations that accurately represent the system’s underlying complexity (Uddin et al. 2022b; Uddin et al. 2023a).

Conclusions and future work

The comparison with the GEM model and the statistical assessment parameters indicates that the MLR and SVR-based driven model performs significantly better in terms of predictions. Furthermore, favorable predictions are shown by the prediction effect with the few parameters that the machine learning models influence the most. Now, this could be useful to avoid wasting time on tedious testing. The SVR-based model simulates the two models most accurately, MLR and SVR. The structural risk minimization concept forms the basis of the SVR model, which accounts for its higher performance. It states that the capacity of SVM or the flatness of the regression function is prioritized over the empirical error when it comes to generalization accuracy.

On the other hand, the inclusion of BOD instead of pH shows significant improvement in the models. It was found that BOD is a more vital parameter in the aquaculture zones, which the earlier researchers even mentioned. So, considering BOD as one of the input variables along with EC, TDS, hardness, and magnesium, the best predictions were obtained with both MLR and SVR. A benefit of machine learning models compared to the Gaussian elimination model when looking for more performant reduced equations. Machine learning models are a more reliable option to capture the complex dynamics of the system under study due to their flexibility, adaptability, and capacity to manage non-linearities.

In summary, the efficacy of this machine learning technique in predicting water quality has been demonstrated, irrespective of seasonal fluctuations and human activities. Given the dearth of hydrological and hybrid models, it is advised that these two categories of predictive models be investigated further to uncover more significant and valuable information for the benefit of a safe and well-fed society and to employ high-quality technology to solve issues. According to the authors, the results of this study should improve the water quality index and guide future investigations.

Further research should be considered to evaluate all pertinent elements, including hydrology, ecosystems, and human activity, in order to completely comprehend the effects on the environment, society, and economy.