Introduction

Eutrophication can occur in many types of aquatic ecosystems such as rivers, wetlands, lakes, and coastal areas (Le Moal et al., 2019). It is one of the main causes for destructing water quality in many types of aquatic ecosystems (Bennett et al., 2001; de Jonge et al., 2002; Le Moal et al., 2019; Smith, 2003; Smith et al., 1999; Vitousek et al., 1997; Zhou et al., 2020). This process also can have various impacts on human health, goods and services, and the economic activities (Le Moal et al., 2019).

The trophic state mainly deals with the primary production such as algae and aquatic plant productivity (Dodds & Cole, 2007). Variations in the nutrient levels (e.g. phosphate) may cause significant changes in algal biomass (chlorophyll-a). This can, in turn, cause changes in water clarity or transparency (Deyab et al., 2021). A trophic state index (hereafter TSI) is a suitable method for quantifying this relationship. Carlson (1977) initially developed the trophic classification systems for the temperate lakes (scaling from 0 to 100). The most important parameters that were developed in the given index consisted of Secchi disk transparency, chlorophyll-a, and total phosphorus (Bekteshi & Cupi, 2014). A modified index based on the Carlson’s trophic state index was proposed by Lamparelli (2004), and almost the same parameters were proposed to examine the TSI in water body. (Xu et al., 2001; Du et al., 2019) used several physical, chemical, and biological indicators consisting of total phosphorus (TP, an important nutrient for water pollution), total nitrogen (TN), chemical oxygen demand (COD), Secchi disk transparency (SD), and chlorophyll-a (Chl-a) to describe the degree of eutrophication or trophic status of lake ecosystem.

The occurrence of eutrophication in a wetland somewhat depends on the nutrient inputs from its watershed. The most common nutrients causing eutrophication are the excessive amounts of the nutrients (e.g. phosphate and nitrogen) of which phosphate concentration plays a key role for enriching wetlands and lakes. The main source of N pollutants is run-off from agricultural land, whereas most P pollution comes from households and industry including phosphorus-based detergents (Beusen et al., 2016).

Knowing the trophic state of aquatic ecosystems (e.g. particularly wetland ecosystem) can give insight into human impact, biotic integrity, and restoration planning (Dodds & Cole, 2007). Both natural and human factors can enhance eutrophication problem (Liu et al., 2010). Since eutrophication is a complicated phenomenon, many studies need to be directed toward broader‑scale analyses of aquatic ecosystems (e.g. wetland, lakes, and reservoirs). To achieve this goal, various issues need to be considered such as measuring the most important physical and chemical parameters, calculating the trophic state indices or their relationship with morphology, land use, and land occupation (Fraterrigo & Downing, 2008; Liu et al., 2010; Noges, 2009; Taranu & Gregory-Eaves, 2008). According to the earlier reports (Armengol & Miracle, 1999; Hamilton et al., 2001; Liu et al., 2010; Taranu & Gregory-Eaves, 2008), eutrophication in water body also may be significantly influenced by the structural parameters such as the depth and volume of the aquatic ecosystems.

International Anzali wetland (one of the most important wetlands in northern Iran) is an enriched freshwater lagoon in the southwest of the Caspian lowlands. This wetland is a nursery habitat for various fish species and a reproduction and wintering habitat for waterbirds from different parts of the world (Nazarhaghighi et al., 2014; Sadeghi et al., 2021) and several mammals (JICA, 2019). The wetland is threatened by a wide variety of anthropogenic pressures consisting of eutrophication, deforestation, erosion, unsustainable hunting, inappropriate tourism, road and urban development (Sadeghi et al., 2012a, b), toxic substances (Mortazavi, 2018), heavy metals (Hassanzadeh et al., 2021), and invasive of exotic species (Zarkami, 2016). Besides, agricultural activities, discharge of industrial wastewater, and sanitary wastewater are the main sources of pollutants in the Anzali wetland (Sadeghi et al., 2013). In the last few decades, this wetland has been threatened by the influx of chemicals particularly phosphorus and nitrogen leading to the eutrophication problem in many parts of this international ecosystem (Sadeghi et al., 2021). As a result of this, eutrophication has become a serious threat in the Anzali wetland ecosystem. Numerous contaminants have negatively affected water quality and thus have degraded the structural nature of the wetland. For the moment, more manure and chemical fertilizers are applied in the wetland watershed to promote soil fertility and increase crop production (particularly in the rice paddy fields) leading to the water enrichment in this valuable wetland. This problem has, in turn, created favourable conditions for the overgrowth of invasive exotic species such as water fern (Azolla filiculoides) (Sadeghi et al., 2017) and water hyacinth (Eichhornia crassipes) (Zarkami et al., 2021) leading to reduced water quality and the degradation of several habitat characteristics (Sadeghi et al., 2021).

Among different types of pollution in wetlands, an appropriate monitoring campaign needs to be designed for the eutrophication problem. The reason is that wetland and lake eutrophication are becoming a global concern in water pollution. To do so, both biotic (e.g. algal biomass and its relationship with the chlorophyll-a) and a set of the most pertinent abiotic variables (total phosphorus, water transparency, etc.) must be considered for monitoring the eutrophication. In addition to this, the appropriate predictive modelling techniques (e.g. Ambelu et al., 2010; Zarkami et al., 2019) need to be implemented for the proper management program of wetlands (Sadeghi et al., 2013). Up to now, many studies have been conducted to examine eutrophication phenomenon in aquatic ecosystems around the world since this process is currently making serious problem for the lakes and wetlands (Hecky & Kilham, 1988).

The present study initially aimed to determine the TSI in the main parts of the international Anzali wetland. After calculating the TSI in the wetland, a data-driven model so-called classification tree or CT model with a J48 algorithm (e.g. Ambelu et al., 2010) was implemented to predict to what extent the environmental variables may contribute to the eutrophication in the wetland. The output of the model was the concentration of chlorophyll-a, and the inputs of the model consisted of a combination of the categorial variables, water quality, and physical and structural variables (the variables associated with the shape and structure of the wetland such as water depth) which were measured monthly in main parts of the Anzali wetland.

Materials and methods

Description of study area

As stated earlier, various problems such as anthropogenic impacts (Sadeghi et al., 2017) and invasive of exotic species (Zarkami, 2016) are threatening the Anzali wetland of which the eutrophication is one of the main problems.

In the present research, the main parts of the Anzali wetland consisting of the eastern, western, central, and southern parts of the wetland were monitored to examine the trophic condition in this valuable ecosystem.

The eastern wetland (Sheijan) has the lowest water depth among other parts of the wetland. The growth of aquatic plants (mainly Phragmites australis) is high because of large amount of nutrients entering this section of the ecosystem. The spread of common reed (e.g. Phragmites australis) and hence losing the habitat’s waterfowl are the main concern in the given area. Therefore, many species are threatened, and they require a low level of chemical oxygen demand to have a sustainable life (JICA, 2019). Artificial disturbance is relatively small due to low human activities in this area. Although the eastern area constitutes a small part of the wetland ecosystem, many polluted rivers discharge into this area. Based on this, the eastern wetland is more polluted than other parts of the ecosystem.

The western wetland (Abkenar) covers more than one-third of the Anzali wetland ecosystem, and it is considered a large lagoon area. In contrast to the eastern part, the western wetland constitutes the deepest area of the Anzali wetland ecosystem. There are many adult fish species inhabiting the lagoon. The water quality condition in the western area is relatively better than other parts of the wetland. Moreover, in the western wetland, water is turbulent, and dissolved oxygen is relatively higher than other parts of the wetland (Abedini, 2015). Overall, the western area is less polluted than other sections of the ecosystem. Artificial disturbance is relatively low in the western end. The western end is abundant with marsh in which the emergent plants particularly common reed (e.g. Phragmites australis) are plentiful in the vicinity of this part of the wetland. The marsh of the western end has high potential for waterfowl. Fishing and hunting are the most important activities in this area. Many boats enter the area.

The central wetland (Sorkhankol) is considered a wildlife refuge not a legally protected area. This area is located between Pirbazar River and Anzali Port. There are open water areas in this part of the wetland. Floating and emergent plants are abundantly found here. The central wetland is deeper than the eastern part. Many boats pass through this part, and polluted water also flows into the central part. The biodiversity is relatively low in the central wetland.

The southern part (Siahkeshim) is a protected area covering an area of about 4500 ha. Many rivers and streams discharge into the Siahkeshim. Emergent plants like Phragmites australis overgrow the entire area. Water depth is low in the southern part. This area includes many bird species. The southern part of the wetland is in danger of encroachment (JICA, 2005). Figure 1 shows the geographical location of the study areas with illustrating the sampling sites in the main parts of the Anzali wetland.

Fig. 1
figure 1

Map of study areas indicating the sampling sites in the main parts of the Anzali wetland

Sampling campaign

Ten sampling sites were selected in four parts of the Anzali wetland consisting of the eastern, central, western, and Siakeshim protected area (Fig. 1). The sites were monitored monthly from the 15th of April 2016 to the 15th of March 2017 (in total 12 months). The site 1 was selected at the end of shipping channel (wave breaking section). This site had less contribution to the monitoring relative to other parts of the ecosystem. The sites 2 and 3 were chosen in the central ecosystem. Two sampling sites (sites 4 and 5) were monitored in the eastern section of the wetland. Three sampling sites (sites 6, 7, and 8) were chosen in the western area. The western wetland had more contribution to the sampling than other parts. The sites 9 and 10 were selected in the Siakeshim protected area. The eastern, central, and Siahkeshim parts were equally sampled in the wetland.

Various environmental variables were monitored monthly at each sampling location. The sampling effort was repeated three times in the vicinity of each site. The three water samples, thus, were mixed up to obtain an individual sample. Therefore, in total, 120 samples were obtained from the entire sampling locations for 1-year study period (12 months). Overall, more samples were taken in the western wetland (36 samples). In contrast, the shipping channel had less contribution to the sampling (12 samples). Other three parts (Siahkeshim protected area, the eastern and central parts) were equally sampled (24 samples for each part and overall 72 samples for three parts).

The water quality measured on-site monitoring consisted of pH (using a combination electrode and calibrating against the buffer solutions), electric conductivity (measuring with or without temperature compensation and calibrating against a standard solution of potassium chloride), and dissolved oxygen (keeping a hand-held metre in a gentle motion through the water column while reading the device). The physical-structural variables which were monitored in situ field measurement were water and air temperature (with a digital thermometer) and water depth (a yardstick). The sample related to chlorophyll-a was collected on a new glass fibre filter paper after filtration of a known volume of sample. The pigments were extracted from the paper in the laboratory by spectrophotometric method. All water quality samples were kept in cold storage container (4° C). The samples were instantly sent to the laboratory (within 2–3 h) for analysing with standard method (APHA/AWWA/WEF, 2012). The samples were monitored in the middle of each month.

Trophic state index (TSI)

In the present research, a multi-parameter evaluation method of TSI (e.g. Carlson, 1977, and modified by Lamparelli, 2004) was primarily used to determine the eutrophication condition in the different parts of the wetland (Table 1). In this method, the range of TSI related to the most important parameters like Secchi disk (SD), chlorophyll-a (Chl-a), and total phosphate (TP) has been proposed to evaluate the TSI in water body. In the multi-parameter evaluation method of TSI, the equations related to SD (Secchi disk), CHL (chlorophyll-a), TP (total phosphate), and TN (total nitrogen) are assessed from 0 to 100. The concept of the TSI is based on the changes in the nutrient levels (particularly total phosphorus). Total phosphorus can cause changes in algal biomass (chlorophyll-a). This can, in turn, lead to changes in water clarity (Secchi disk transparency).

Table 1 Classification of water resources based on the proposed range of trophic state index (TSI) (Carlson, 1977, and modified by Lamparelli, 2004)

The applied model (the CT model with a J48 algorithm)

The decision trees (Quinlan, 1986) or classification trees (Breiman et al., 1984) which were applied in the present study are artificial intelligence-based model prediction. The models predict the dependent variable from a set of independent variables, either continuous or discrete (D’heygere et al., 2003).

After determining the TSI for different parts of the Anzali wetland, a CT model with (a J48 algorithm) was implemented to predict the most important factors for assessing the trophic condition in the study areas. Due to the relatively small dataset in the present research, the straight “folds cross-validation” was used instead of other test options which are available in the WEKA toolbox consisting of “supplied test” (the model specifies a separate file containing the test set which is very often used for large datasets), “use training set” (the derived model performs on the training set), and “percentage split” (a certain percentage of the data can be held out for testing) (Witten et al., 2011).

By applying the CT model (after determining the TSI), it was possible to determine to what extent the predicted variables may have contributed to the eutrophication in the wetland.

Various environmental variables were inserted to the model (air temperature was excluded from the model due to high correlation with water temperature). In addition to the inclusion of the environmental variables to the model, all sampling sites and different seasons were also used as categorial variables for the applied model. The sites and seasons (functioning as a seasonality) were classified into 10 sites and 4 seasons to show the possible effect of the spatial (the sites) and temporal (seasonality) patterns on the eutrophication condition in the wetland. Therefore, in total, 12 input variables (10 environmental variables + 2 categorial variables) and one output variable (chlorophyll-a) were inserted to the model.

Since chlorophyll-a (the only output variable in the present research) functions a photosynthetic pigment being essential for photosynthesis in organisms (e.g. eukaryotes, cyanobacteria, and prochlorophytes), the concentrations of this variable are closely associated with the abundance and biomass of phytoplankton. Therefore, various independent variables (like the input variables recorded in the present study) may influence the amount of chlorophyll-a in the wetland. As a result of this, chlorophyll-a (an indicator of phytoplankton abundance and biomass in coastal and estuarine waters) was selected as a model’s output (a dependent variable). To do so, the output of the model was divided in three TSI categorical classes which were obtained from the TSI in the wetland (eutrophic, super-eutrophic, and hyper-eutrophic). As stated already, the environmental variables were employed as the model’s inputs.

The reliability of the model was checked with four pruning confidence factors (PCFs = 0.50, 0.25, 0.10, and 0.01) which are commonly used in the literature (Witten et al., 2011). Based on the Eqs. (1) and (2), the accuracy of the J48 algorithm was assessed based on two predictive performance criteria so-called the Cohen’s Kappa or simply k (Cohen, 1960) and the percentage of correctly classified instances (CCI%) (Witten et al., 2011). The performance criteria of the predictive CT model for the multiclass classification (eutrophic, super-eutrophic, and hyper-eutrophic) were derived from a confusion matrix presented in Table 2. In the given matrix, the model predictions were tabulated vs real observations. For assessing the multiclass classification, the accuracy of the CT model was calculated based on these two equations, that is, k and CCI% (Sadeghi, 2022). Based on this, the two following performance criteria of the CT model were used for evaluating three TSI classes:

$$\mathrm{Correctly\;Classified\;Instances}\;(\mathrm{CCI}\mathrm{\%})=(a+e+i)/\mathrm{N})\times 100$$
(1)
$$\mathrm{Cohen's\;Kappa}(k)=\frac{\mathrm{(I/N)-(II/N)*(III/N)+(IV/N)*(V/N)+(VI/N)*(VII/N)}}{1-[\mathrm{(II/N)*(III/N)+(IV/N)*(V/N)+(VI/N)*(VII/N)]}}$$
(2)

where N is total number of instances (120 instances); a + e + i = I; a + d + g = II; a + b + c = III; b + e + h = IV; d + e + f = V; c + f + i = VI; g + h + i = VII.

Table 2 The confusion matrix for three TSI classes as a basis for evaluating the observed (actual) classes vs predicted ones

Results

Data pre-processing and assessment of trophic state index before model development

The distribution of data for the recorded variables in the wetland

Table 3 shows the data distribution for the recorded variables which were used for analysing the eutrophication status in the Anzali wetland. The data for the environmental variables (except bicarbonate, p > 0.05) had no normal distribution (p < 0.05). A significant difference (based on Mann–Whitney test) was found between different TSI classes and total phosphate (p < 0.01) and dissolved oxygen (p < 0.01) which indicates that there is a significant relationship between various TSI classes and the concentration of the variables.

Table 3 The data distribution of the recorded variables for analysing the eutrophication status in the Anzali wetland ecosystem. Stand. dev standard deviation, EC electric conductivity, DO dissolved oxygen, TN total nitrogen, TP total phosphate, WT water temperature, AT air temperature, Chl-a chlorophyll-a, Tra transparency. *: not used for the model development

The obtained results demonstrated that the outliers were observed in several variables (despite, all observations were retained in the model). The mild and extreme outliers were observed in some variables consisting of chlorophyll-a, water turbidity, total phosphate, total nitrogen, electric conductivity, and water transparency (Fig. 2). As visualized in the box and whisker plots, the extreme outliers were observed when the concentration of chlorophyll-a exceeded 200 µg/l. The extreme data values were also detected for the nutrients where the concentration of total phosphate and total nitrogen went above 0.60 mg/l and 3 mg/l, respectively. Such outliers for the water transparency and turbidity were observed at the values higher than 1.95 m and 100 FTU, respectively. For the electric conductivity, the amounts of greater than 9000 µS/cm were considered the extreme outliers. The outliers appeared on the total phosphate and total nitrogen are quite normal due to the eutrophication of the wetland with the excessive amount of the nutrients in different sampling sites/seasons, and for the water transparency, the extreme values can be related to the variations in the inorganic particles (e.g. sediment from erosion) or organic particles (such as dead algae and phytoplankton) in the wetland in different seasons. The difference between the concentrations of dissolved ions in various sampling sites/seasons may cause such extreme outliers for the electric conductivity. The extreme data points appeared on the variable “chlorophyll-a” can be attributed to the variation in the photosynthesis activities in different seasons. Significant fluctuation in the water turbidity is as a result of the presence of suspended matter entering the Anzali wetland in different seasons. 

Fig. 2
figure 2

The box and whisker plots for representing the range of data distribution in the Anzali wetland in different seasons (the number of instances: 120). The outliers are indicated with circles. TP Total phosphate; TN Total Nitrogen, Tur Turbidity, Chl-a Chlorophyll-a, Tra Transparency (the unit of transparency is presented in terms of “cm” for better visualizing the outliers), EC Electric Conductivity

The yearly average of the monitored variables in the wetland

The yearly average of the recorded variables in different parts of the wetland is presented in Table 4. As demonstrated in the table, the maximum and minimum of the nutrients (total phosphate and total nitrogen) were mainly observed in the eastern and western parts of the wetland, and for the water transparency, the given ranges were observed in the western and central sections of the wetland, respectively. The maximum and minimum of the chlorophyll-a were correspondingly found in the central and western sections of the ecosystem.

Table 4 The yearly average of the recorded variables in the different parts of the Anzali wetland. WT water temperature, EC electric conductivity, DO dissolved oxygen, TN total nitrogen, TP total phosphate, AT air temperature, Tra transparency. *: not used for the model development

TSI

Table 5 shows the results of trophic state index (TSI) for different parts of the Anzali wetland. With looking at Table 5 and comparing it with Table 1, it can be concluded that the western and Siahkeshim parts of the wetland are classified as eutrophic and super-eutrophic, respectively. Other sections of the ecosystem (eastern, central, and shipping channel) are classified as a hyper-eutrophic, and the whole parts of the wetland are also classified as a hyper-eutrophic.

Table 5 Calculation of TSI for different parts of the Anzali wetland

Correlation analysis

According to the correlation analysis, positive and strong correlation (with an absolute value > 0.50) was found between water temperature and air temperature (p < 0.01, r = 0.94) (as stated already, air temperature was excluded from the CT model due to strong correlation with water temperature). Chlorophyll-a was positively correlated with several variables including total phosphate (p < 0.01, r = 0.62), total nitrogen (p < 0.01, r = 0.49), water turbidity (p < 0.01, r = 0.56), water temperature (p < 0.01, r = 0.61), and air temperature (p < 0.01, r = 0.59). This indicates that an increase in the amount of chlorophyll-a is associated with an increase in the concentration/value of the nutrients (total phosphate and total nitrogen), water turbidity, and water temperature. In contrast, a strong and negative correlation (with an absolute value > 0.50) was found between water turbidity and water transparency (p < 0.01, r =  − 0.57) and bicarbonate and pH (p < 0.01, r =  − 0.54) which demonstrates that an increase in the concentration of water turbidity and bicarbonate may decrease water transparency and pH in the wetland, respectively.

Relationship between the environmental variables and the monitoring sites/seasons

According to the one-way ANOVA (post hoc test), several variables showed a significance difference with various sampling sites and seasons in the wetland consisting of pH (sites: F = 7.53, p < 0.05; seasons: F = 19.06; p < 0.05), electric conductivity (sites: F = 4.00, p < 0.05; seasons: F = 22.29; p < 0.05), dissolved oxygen (sites: F = 6.50, p < 0.05; seasons: F = 5.76, p < 0.05), water transparency (sites: F = 3.70, p < 0.05, seasons: F = 9.63, p < 0.05), and total phosphate (sites: F = 5.54, p < 0.05, seasons: F = 8.67; p < 0.05). Water turbidity, total nitrogen, chlorophyll-a, and air/water temperature had a significant difference only with the sampling seasons (p < 0.05 for all variables), while water depth and bicarbonate showed a significant difference only with the sampling sites (p < 0.05 for both variables).

Modelling

When the induced CT model was run with all variables (1 output and 12 input variables), only one variable (the sampling sites) was decided upon by the model, so other variables were not predicted by the model. Therefore, this CT result did not have an added value for the interpretation of eutrophication because no exact tree was constructed by the model. Consequently, this categorial variable (the sites) had to be excluded from the model to construct a meaningful tree. When other input variables (the environmental variables and seasonality) were retained in the model, a tree (including seasonality) was constructed, but the variable “seasonality” was merely predicted at the lower PCF level (the PCF = 0.50). This made the constructed tree very complex (the number of leaves: 14 and size of the tree: 25). The complexity of the constructed tree also had a less added value for the interpretation of eutrophication in the wetland. As a result of this, the variable “seasonality” also was excluded from the model in the next step. With including the rest of input variables (10 variables) to the J48 algorithm, eight input variables were finally decided upon by the model. In this case, a seven-fold cross-validation (albeit after five-time randomization effort) gave the best predictive performances (CCI = 80%, k = 0.67) than other fold cross-validation (Fig. 3). This indicates that the model reached its highest reliability to predict the trophic condition in the sampling sites.

Fig. 3
figure 3

Checking the reliability of the model with applying different fold cross-validations for analysing the trophic condition in the Anzali wetland (CCI %: percentage of correctly classified instances)

The constructed tree (with the highest pruning confidence factor: PCF = 0.01) (Fig. 4) showed that several variables consisting of bicarbonate, total phosphate, water turbidity, pH, electric conductivity, dissolved oxygen, water temperature, and water depth may play a key role for predicting the trophic state in the Anzali wetland of which the decision tree initially started growing with “bicarbonate.” In this step, the model did not make an ultimate decision for the variable “bicarbonate.” However, all predicted variables in this phase depended on the “bicarbonate.” According to the rule extracted from the CT model, when pH concentration goes toward the basic condition (> 7.75) and when the amount of water turbidity is high (> 60 FTU), the ecosystem is considered hyper-eutrophic. When the amount of water turbidity falls below 60 FTU, the wetland may be classified as eutrophic. In this case, the rule for model’s decision is reasonable since 34.29 out of 120 instances were assigned to the given rule and 5.29 out of 34.29 instances still were categorized as misclassified. In the right side of the tree, the variables “water temperature, dissolved oxygen, electric conductivity, water depth, water turbidity, total phosphate, and pH” were predicted by the model. If water temperature shows a decrease in the Anzali wetland region (≤ 28 °C), the prediction of the model depends on several factors. On the contrary, if water temperature increases (> 28 °C), the concentration of dissolved oxygen will determine the trophic condition in the wetland. According to the decision made by the model, if the dissolved oxygen drops (≤ 6.93 mg/l), the wetland will become hyper-eutrophic. If electric conductivity is high (> 1008 µS/cm), the wetland will be hyper-eutrophic, while when the amount of the electric conductivity is decreasing (≤ 1008 µS/cm), the trophic condition in the wetland may depend on several factors consisting of water depth, water turbidity, total phosphate, and pH. The wetland may be classified hyper-eutrophic, when total phosphate is high (> 0.13 mg/l) or depth of the wetland is low (≤ 0.8 m).

Fig. 4
figure 4

The induced tree for predicting the trophic state in the Anzali wetland (PCF = 0.01; values between brackets indicate the instances in which rules are true/false); the number of instances: 120; DO, dissolved oxygen; WT, water temperature; blue rectangle, eutrophic (Eu); orange rectangle, super-eutrophic (Seu); pink rectangle, hyper-eutrophic (Heu); grey oval, the selected attributes by the model

Among 10 input variables inserted to the model (after excluding sampling sites and seasonality from the model in two consecutive stages), total nitrogen and water transparency were never predicted by the induced model. As stated in the correlation analysis, total nitrogen and water transparency were strongly correlated with total phosphate (r = 0.60) and water turbidity (r =  − 0.57), respectively. Based on this, the model decided to predict the most important variable(s).

Discussion

Since eutrophication is affected by both anthropogenic and natural factors (Liu et al., 2010), the scientific research for analysing the trophic condition in wetlands is globally important for the management of these valuable ecosystems. Based on this, the present research firstly aimed to examine the TSI in one of the most important wetlands in Iran (the international Anzali wetland). Additionally, a predictive modelling (the CT model with a J48 algorithm) helped identify the most important explanatory variables that may play a key role for analysing the eutrophication condition in the wetland. To do so, it was possible to determine to what extent the predicted variables may contribute to the eutrophication in the wetland.

The maximum amount of chlorophyll-a was found in the sampling sites where the nutrients (mainly phosphate and nitrate) had the highest concentration. According to the earlier findings (e.g. Le Moal et al., 2019), the availability of nutrients (particularly the phosphate and nitrate) can stimulate photosynthesis activity leading to a high algal production. This can, in turn, intensify the eutrophication problem in the wetland. Moreover, an elevated amount of chlorophyll-a may enhance water turbidity in the wetland.

According to the correlation analysis, the minimum amount of chlorophyll-a is linked to an elevated water transparency. This can be justified that the sampling sites characterized with the clear waters (those characterized with high water transparency or less water turbidity) as well as less polluted waters (those characterized with high oxygen concentration) are not favourable places for the phytoplankton production (Le Moal et al., 2019; Silvino & Barbosa, 2015).

With introducing various variables to the J48 algorithm (physical–chemical, structural, and the variables related to the spatial and temporal patterns), the most important explanatory variables that were responsible for predicting the trophic state of the wetland were identified by the model. Such predicted variables can help wetland managers to appropriately restore aquatic ecosystems (Lock & Goethals, 2012) based on the eutrophication problem. In the present study, the highest level of tree pruning (PCF = 0.01) was applied to make a simple but more powerful tree (De'ath & Fabricius, 2000). The earlier studies (Witten et al., 2011) showed that tree pruning at high PCF value (e.g. PCF-0.01) may decrease the size and complexity of the model. The optimization of tree with the highest PCF resulted in an easy interpretation of the predicted variables for analysing the eutrophication condition in the Anzali wetland.

Based on the model’s outcome, the variable “bicarbonate” was the first explanatory predictor to examine the eutrophication condition in the wetland. This predictor appeared at the beginning of the induced tree. Other predicted factors in the wetland were associated with the bicarbonate. In this step, however, no definitive decision was made by the model. The main reason is that an increase in the bicarbonate concentration may cause more nutrients (particularly total phosphate) that are taken up by the algae in the Anzali wetland. This may, in turn, increase the biomass production of algae and hence intensify eutrophication in this ecosystem (e.g. Bordalo et al., 2001; Cloern, 2001). Water temperature and pH were constructed at the right and left side of tree, respectively. All three predicted variables in the first and second levels of the induced tree (bicarbonate, water temperature, and pH) are intercorrelated with each other. In the sampling sites, pH was strongly and negatively correlated with bicarbonate (r =  − 0.54).

As confirmed by the model, along with water quality variables, the physical-structural variables such as water temperature and water depth may play a key role in predicting the trophic condition in the Anzali wetland. Based on the outcomes of the model, the hyper-eutrophic of the wetland (high phytoplankton production) may be linked to high water temperature together with low dissolved oxygen. The prediction of CT model is logic because high water temperature particularly in hot seasons (with hot spring and summer seasons characterized in the climate of northern Iran) can stimulate algal blooms in the wetland. However, many factors (e.g. nutrients) may also accelerate algal blooms and hence contribute to the eutrophication problem in the wetland of which water/air temperature can be one of the most important driving predictors to speed up the growth of algae. The overgrowth of algal blooms may, in turn, result in a significant drop of the dissolved oxygen concentration in the wetland. However, it is worth mentioning that the concentration of dissolved oxygen can considerably fluctuate during daytime and nighttime in the eutrophicated wetland. Despite, based on the applied model, the general trend of dissolved oxygen shows a significant drop of this variable over a long period of time (1-year study period) in this international ecosystem. The impact of water/air temperature on the eutrophication of aquatic ecosystems was already confirmed by earlier studies (e.g. Le Moal et al., 2019).

The results of the model also showed that the wetland may be hyper-eutrophic when the water depth is becoming shallower. In contrast, the wetland may be eutrophic when the water depth is becoming deeper. The maximum water depth was measured in the western part of the wetland. This indicates that a gradual increase in the water depth in the Anzali wetland ecosystem may limit the phytoplanktonic blooms. As a result, this would lead to a low chlorophyll-a production by phytoplankton. The finding is in line with previous reports (e.g. Le Moal et al., 2019; Liu et al., 2010; Taranu & Gregory-Eaves, 2008) who confirmed that the shallower waters can cause wetlands more eutrophic than the deeper waters. However, based on the outcomes of model’s prediction, the wetlands may still be hyper-eutrophic (at the higher water depths) when the nutrients (e.g. phosphate) are found in high concentration.

Based on the outcomes of the predictive model, a hyper-eutrophic in the wetland is associated with an increase in pH concentration together with high water turbidity. The possible reason is that a high primary production by phytoplankton is linked to high pH concentration due to easy access to the nutrients (e.g. phosphate). In other words, the change of pH may result in algal blooms which eliminate carbon dioxide from the water. The minimum (6.9) and maximum of pH (10.03) were recorded in the sites 1 (the site characterized with the hyper-eutrophic condition) and 6 (the site characterized with the eutrophic condition), respectively. One of the main causes of pH variation in the Anzali wetland can be the availability of carbonate (positive correlation with pH) or bicarbonate (negative correlation with pH).

Another factor that contributed to the prediction of the eutrophication in the wetland is electric conductivity. As already stated in the correlation analysis, electric conductivity was significantly and positively correlated to the amount of chlorophyll-a in the wetland (p < 0.01, r = 0.40). Model predicted that the higher trophic condition (hyper-eutrophic) may be found in the Anzali wetland where electric conductivity reaches the highest amount. The maximum electric conductivity was found in the site 1 (due to the vicinity of this sampling site to the Caspian Sea). It can be argued that due to the low precipitation and rising air/water temperature particularly in the warm seasons in the Anzali wetland’s climate, the amount of evaporation increases from the surface water of the wetland. In addition to this, the vicinity of some sampling sites to the Caspian Sea (e.g. sites 1, 2, and 3), the concentration of electric conductivity increases in the water. Therefore, phytoplankton activities and hence chlorophyll-a may show an increase in such conditions.

According to the model, an increase in chlorophyll-a (displayed as super and hyper-eutrophic condition on the constructed tree) is associated with an increased concentration of water turbidity. The model’s prediction is reasonable because the variables “electric conductivity, chlorophyll-a, total phosphate, and total nitrogen” have a close relation with the amount of water turbidity in the Anzali wetland. Earlier studies (e.g. Silvino & Barbosa, 2015) showed that an increase in water turbidity (especially at the lower water depths) may considerably contribute to the water enrichment. In the present research, the mean depth recorded in the whole parts of the Anzali wetland was 1.78 m (the minimum and maximum depths of 0.15 m and 5 m, respectively). Also, the yearly average of the recorded depth in the Anzali wetland varied from 0.65 m (in the central part) to 4.36 m (in the shipping channel). This indicates that the entire wetland is becoming more shallower than the past few years so that such a water appearance may become cloudy or murky at the lower parts of the wetland. The shallower depth of the Anzali wetland may, in turn, have high turbidity characterized with several factors consisting of phytoplankton, erosion, urban runoff, wastewater discharge, algae, and sediment disruption. Therefore, at the lower depths of the wetland, light can be easily available for the photosynthesis activities of phytoplankton or algae.

According to the model’s prediction, an adequate light penetration into the lower waters (Le Moal et al., 2019) together with the availability of the nutrients (e.g. phosphate and nitrate) will, in turn, increase the density of phytoplankton due to sufficient photosynthesis activity.

The results of the predictive model showed that an increase in the concentration of the nutrients such as total phosphate will certainly enhance the amount of eutrophication in the Anzali wetland resulting in a hyper-eutrophic condition in this ecosystem. This was also confirmed by the previous studies (e.g. Le Moal et al., 2019; Silvino & Barbosa, 2015) demonstrating that hyper-eutrophic of the wetland particularly in hot seasons is associated with high concentration of nutrients such as total phosphate and total nitrogen. The impact of enriched waters on the growth of algae and excessive amount of nutrients (eutrophicated water) was already confirmed by Le Moal et al. (2019).

Among the variables used to analyse the TSI in the Anzali wetland, total nitrogen and water transparency were never predicted by the model. As already stated, two nutrients (total phosphate and total nitrogen) were significantly and positively correlated with chlorophyll-a in the study wetland (they are also strongly intercorrelated with each other). In such a case, the data-driven methods like the induced CTs model would give the top priority to the most important variable(s) so that total phosphate is more important nutrient than total nitrogen to stimulate algal blooms in wetlands/lakes. For that reason, the nitrate concentration may not be very often used in the TSI analysis in lakes and wetlands (e.g. Bekteshi & Cupi, 2014; Carlson, 1977). That is the possible reason that total nitrogen has not been predicted by the model for evaluating the eutrophication condition in the Anzali wetland. Water turbidity and water transparency were positively and negatively correlated with the chlorophyll-a in the wetland, respectively (p < 0.01, r = 0.56 for water turbidity and p < 0.01, r =  − 0.40 for water transparency) so that water turbidity was predicted by the model instead of water transparency.

The outcomes of the present research (according to the one-way ANOVA analysis) showed that the eutrophic condition in the Anzali wetland is influenced by significant variation in the spatial and temporal patterns. Some sites (e.g. 4 and 5) situated in the eastern part of the wetland are highly eutrophicated particularly in hot seasons (e.g. summer season). The main cause of the pollution in these sites is related to the excessive amount of nutrients. Besides, there are various human activities in different parts of the Anzali wetland (Sadeghi et al., 2017). For the last few years, the wetland has been adversely affected by numerous pollutants. These contaminants result from the industrial, urban, and agricultural activities taking place in the surrounding areas (Vesali Naseh et al., 2012; Sadeghi, 2022). Based on this, various levels of pollutants unfavourably influence the sampling sites in different seasons causing the Anzali wetland very polluted (Sadeghi et al., 2014). The most important pollution sources to the Anzali wetland are rivers, domestic sewage, industrial and agriculture wastewater, and hospitals. The wastewaters contain all sorts of pollutants such as heavy metals, nutrients, and precipitates (Sadeghi, 2022). Therefore, the main difference in highly eutrophicated sites (from the eutrophic to hyper-eutrophic) may be linked to the extent of pollutants. In contrast, some sampling sites (e.g. 6, 7, and 8) situated in the western part of the wetland are relatively less polluted (eutrophic condition) than other sites particularly in the cold seasons (e.g. winter). This can be justified that the amount of water transparency in the western part of the wetland is relatively higher than other sites due to deep water.

In summary, to achieve a more reliable prediction about the eutrophication in the Anzali wetland, the applied model can be extended by adding more variables (e.g. biological and chemical oxygen demand and total hardness). Moreover, more sampling sites need to be monitored in the entire Anzali wetland. Thus, adding more sampling sites and measuring some important variables will certainly result in a more reliable prediction for the eutrophication in this valuable ecosystem in future monitoring. This can, in turn, help wetland managers and decision-makers to appropriately protect this valuable ecosystem.

Conclusions

In the present research, the eutrophic condition in the Anzali wetland was initially determined by trophic state index (TSI). Besides the assessment of the TSI, a data-driven technique (CTs model) was implemented to identify the most important predictors (physical-structural, water quality, and the variables related to the spatial and temporal patterns) for analysing the eutrophication in the wetland. Based on the results of the TSI, the whole parts of Anzali wetland are facing with eutrophication problem. Except the western part of the wetland (classified as a eutrophic), other parts of the wetland are classified as a super and hyper-eutrophic. It is concluded that the most important parameter settings need to be applied for the CT model to successfully predict the trophic condition in wetlands. Besides that, an appropriate model calibration (e.g. finding a best pruning optimization and optimal cross-validation of the model) may improve the model performances and hence select the most important explanatory variables for predicting the eutrophication in the Anzali wetland. Based on the outcomes of the CTs model, it is concluded that water quality variables (e.g. bicarbonate, pH, turbidity, electric conductivity, dissolved oxygen, and total phosphate) and physical-structural factors (e.g. water temperature and water depth) may jointly contribute to the prediction of the trophic conditions in the wetland. Consequently, for the future monitoring and the management of this valuable wetland, one needs to select the predicted variables which were decided upon by the model. Another conclusion derived from the current study (according to ANOVA test) is that the temporal and spatial patterns can influence the eutrophic condition in the Anzali wetland.