Introduction

Reservoirs are usually formed by the damming of a watercourse with lotic characteristics, imposing a fundamental physical change on the river continuum. The river’s speed decreases as it approaches the dam wall and the reservoir created becomes a lake system. The formation of a lentic environment involves changes in the fluvial regime, in the residence time of the water, increased loss by evaporation, stratification in vertical zones, in addition to effects on aquatic communities (Schmutz & Moog, 2018; Winton et al., 2019).

The reservoirs can have multiple functions besides to power generation, such as flood control, water clarification, and water supply or retention (Chen et al., 2019). A reservoir with several functions has greater advantages in improving the region in which it is installed and stabilizing the operation of energy production (Penghao et al., 2019).

Brazil is one of the largest holders of reservoirs for electricity generation in the world (IHA, 2020), and the country has been increasingly utilized the hydroelectric reservoirs in diversified manners, requiring an assessment of its water quality. However, the lack of financial resources to establish consistent and long-term water quality monitoring programs makes studies on this subject still scarce in the country. The Irapé hydroelectric power plant (HPP) reservoir is a large (142.95 km2) Brazilian semi-arid water body with great potential to contribute to the region’s water supply. This region has great water shortages, and the access democratization to quality water resources is still a very important point to be discussed.

Thus, water quality monitoring is essential to better understand the conditions and singularities of these water bodies (Soares et al., 2017). However, oftentimes, the monitoring programs end up generating a set of data that is difficult to interpret and analyze for decision-making (Achieng et al., 2017; Muangthong & Shrestha, 2015), requiring the application of specific tools. The multivariate statistical analyzes makes easy the interpretation of complex water quality data sets and are commonly used to understand many data obtained (Varol, 2020). These analyses allow the identification of the studied ecosystems status and the more relevant pollution sources that affect the water quality (Achieng et al., 2017; Ustaoğlu et al., 2020a, b), helping to make reliable decisions for the water resources management.

Multivariate statistical techniques have been applied to evidence spatial and temporal variations of water quality and to identify the main parameters and impacts responsible for these variations (Achieng et al., 2017; Chounlamany et al., 2017; Hajigholizadeh & Melesse, 2017; Siepak & Sojka, 2017; Tepe & Aydin, 2017; Ustaoğlu et al., 2020a, b; Varol, 2020). The cluster analysis (CA), in its application, aims to divide a large group of objects into small classes (cluster), based on their similarities. Thus, the resulting clusters must therefore present high internal homogeneity and high external heterogeneity (Muangthong & Shrestha, 2015). It can help to interpret data and indicate patterns (Hajigholizadeh & Melesse, 2017) generating hypotheses (exploratory analysis), and not making confirmations about the data (confirmatory analysis). Therefore, it is necessary to later validate the results obtained through the application of other statistical methods.

The Kruskal–Wallis (KW) test is a nonparametric statistical test that assesses the differences among three or more independently sampled groups on a single, non-normally distributed continuous variable. As the Kruskal–Wallis test does not indicate specifically which groups have significant differences from other groups, post hoc testing can be conducted. The most common post hoc test for the Kruskal–Wallis test is the Dunn test (Mangiafico, 2016). The Spearman rank-order correlation evaluates the monotonic relationship between two continuous or ordinal variables (Mian et al., 2021). These tests can be applied between the formed clusters, providing the possible differences significantly responsible for the monitoring stations’ separation in different groups and the relationship between the variables responsible for this distinction (Qu et al., 2019; Guo et al., 2020). The principal component analysis (PCA) is another multivariate analysis widely used to assess water quality. The main objective is to reduce the data set through the formation of factors which represent the entire structure of the original data (Ustaoğlu et al., 2020a, b).

Hence, this study aims to apply statistical techniques to assess the surface water quality of the Irapé HPP reservoir and its tributaries. Univariate and multivariate techniques were applied to evaluate the spatial variability of monitoring stations and to understand the relationship between the main parameters and the lentic and lotic environments, considering data from the last 11 years (2008–2018).

Materials and methods

Study area

The Irapé HPP, or Presidente Juscelino Kubitscheck HPP, was opened in 2006. It has a 142.95-km2 reservoir that covers seven municipalities, a maximum total volume of 5954.88 hm3, and 399 MW installed capacity. The operational water levels vary between 470.8 and 510 m (Cemig, 2016). It is inserted in the Alto Jequitinhonha watershed (JQ1 Water Resource Management Units—WRMU), with 19,855 km2 area covering a total of 26 municipalities (Igam, 2010) and a population of approximately 121,000 inhabitants. The region’s predominant activity is forest production, more specifically Eucalyptus sp., agriculture, and livestock productions (Silva & Miranda, 2015). The monitoring data were available by Energy Company of the State of Minas Gerais (Cemig) and covered 14 stations in the Irapé HPP reservoir and its tributaries, from the year 2008 to 2018. The geographic location and description of the stations are presented in Fig. 1.

Fig. 1
figure 1

Geographic location of the Irapé HPP reservoir, its tributaries, and the water quality monitoring stations

Water quality data

The secondary data used in the present study were provided by the Cemig which includes 14 water quality parameters monitored quarterly for 10 years (2008–2018), collected, and analyzed according to the methods in the Standard Methods for the Examination of Water and Wastewater (APHA, 2012). The following parameters were analyzed: total alkalinity (TAlc), thermotolerant coliforms (therm. coli.), electric conductivity (EC), biochemical oxygen demand (BOD), dissolved iron (DFe), total phosphorus (TP), nitrate (NO3-), total ammoniacal nitrogen (TAN), dissolved oxygen (DO), pH in loco (pH), total dissolved solids (TDS), sulfate (SO4−2), water temperature (WT), and turbidity (turb.).

Statistical methods

For a general view of the water quality in the region under study, descriptive statistics were calculated for non-standardized data of each water quality parameter, with values of mean, median, minimum and maximum, standard deviation, and 25 and 75% percentiles.

The concentrations standardization of the parameters selected in the Z scale (Z = (X-μ)/σ) were performed to eliminate the effects of differences in the scales (Hair et al., 2009). The cluster analysis (CA) was then performed, with the objective of grouping the similar monitoring stations according to the concentrations of the measured water quality parameters. The hierarchical agglomerative method, using the Ward clustering technique, was applied, considering the Euclidean distance as a measure of dissimilarity.

To validate the formed clusters, the cophenetic correlation coefficient (CCC) was calculated, which consists of Pearson’s linear correlation between the dissimilarity matrix elements (distance matrix between the monitoring stations obtained from the original data) and the cophenetic matrix elements (matrix of distances between the monitoring stations generated from the dendrogram), to evaluate the fit between both matrices (Sokal & Rohlf, 1962). Values closer to 1 indicate a regular fit between the similarity matrix and the obtained dendrogram and, therefore, a better representation of the results, whereas values < 0.7 indicate that the clustering method used should be questioned (Rohlf, 1970). The Fenon line position (cut line) used as a reference for the groupings was determined by the method proposed by Vicini (2005), which considers the connection values between the formed clusters and suggests their position according to the greatest observed variations.

The normality of the data was assessed by the Shapiro–Wilk test, at 0.05 significance level, applied for each parameter in each season. Therefore, the Kruskal–Wallis non-parametric statistical test, followed by the multiple comparisons test (when applicable), and Spearman’s correlation test, at 0.05 significance level, were applied between the formed clusters, allowing one to observe which water quality parameters were significantly responsible for the grouping of monitoring stations in different clusters and understand the correlation between them.

The PCA was implemented based on the Spearman correlation matrix, obtained after varimax rotation. The two main components that most explained the water variability were represented in the form of a biplot graphic, containing information about the analyzed parameters and the monitoring stations under study. Since each data collection represents an observation in the graphic, for this analysis was used the parameters average in each station, to facilitate the results interpretation. All statistical tests mentioned were performed using the Statistica® 8.0 software.

Results and discussion

Descriptive statistics

The water quality of the HPP Irapé reservoir and its tributaries was evaluated by comparing the descriptive statistics values obtained for the parameters analyzed with the standards of Brazilian legislation present in CONAMA Resolution No. 357/2005 (Brazil, 2005) for Class 2, which are superficial waters that can be destined to human consumption, after conventional treatment. The descriptive statistics for each parameter are summarized in Table 1.

Table 1 The physicochemical parameters analyzed in the study area

The DFe value drew attention in the present study, being the only parameter which the median (0.35 mg Fe L−1) exceeded the national resolution limit (0.3 mg Fe L−1). When considering a temporal variation, the results become even more alarming, since the DFe showed a significant upward trend, considering the same period analyzed in the present study (Oliveira et al., 2021). High metal concentrations in the region could be associated with natural origin. According to the HPP Irapé Environmental Plan for Conservation and Use of Water Surrounding the Reservoir—PACUERA (Cemig, 2012), the HPP Irapé region is characterized by the presence of red oxisols, which contains a high iron oxide (Fe2O3). The result may also be associated with the low pH values found in the waters of the study area, since acidic conditions tend to favor the dissolution of iron (Liu et al., 2019).

The pH median (6.95) remained within the expectations in national legislation (between 6 and 9). However, the minimum value (2.99) and the 25% percentile (6.30) demonstrated acidity of the water in some points and moments of the analyzed historical series. As for DFe, the acidity may be associated with the predominance of red oxisols in the watershed, which are characteristically acidic soils. In addition, the contact of aerated water in the reservoir with the dam itself leads to the oxidation of sulfide minerals present in the rockfill that was built with rocks present in the region (Cemig, 2016). Low pH values can affect the growth of the ichthyofauna and cause its death when below 4.5. Besides that, relatively high concentrations of metals are often detected in acidic waters, due to the solubilization of these metals by low pH values, which can negatively affect aquatic organisms and human health (Nienie et al., 2017; Saito & Watanabe, 2020).

The DO is an important parameter in water quality as it is a limiting factor for aquatic life (Ustaoğlu et al., 2020a, b). Regarding the reservoir oxygenation and surroundings, the DO values varied between 0.60 and 9.60 mg O2 L−1. The median (6.49 mg O2 L−1) was in line with the limit established in the national legislation (> 5 mg O2 L−1), although the minimum value was below this limit. However, the 25% percentile (5.70 mg O2 L−1) allows to conclude that less than 25% of the measured values violated the standard. The BOD values ranged from 0.10 to 15.20 mg O2 L−1. The observed maximum values (15.20 mg O2 L−1) were at odds with the national resolution limit (5 mg O2 L−1). However, the median (3 mg O2 L−1) and the 75% percentile (3.23 mg O2 L−1) analyzed demonstrate that most of the observed BOD values remained below the maximum allowed. Thus, the results suggest that the oxygenation conditions of the reservoir and its tributaries, in general, are satisfactory.

The TP concentrations vary between 0.002 and 2.26 mg P L−1. The maximum value observed exceeded the limit imposed by national legislation for lotic environments (0.1 mg P L−1) and for lentic environments (0.03 mg P L−1). National legislation is more restricted for lentic environments since TP is one of the main factors responsible for the excessive increase of primary producers in these environments (Moal et al., 2019). The excess of nutrients results in the overgrowth of phytoplankton, macrophytes, and the proliferation of toxic algae, which can cause ecological and toxicological effects that are directly or indirectly related to that proliferation. Dissolved oxygen is consumed and depleted as bacteria break down carbon in dead plant material, which can cause hypoxia, resulting in the fish death, loss of aquatic biodiversity, and many other adverse ecological effects (Moal et al., 2019; Pérez-Gutiérrez et al., 2017). The median (0.02 mg P L−1), however, was in accordance with the legislation in both types of environments.

The turbidity is caused by suspended particles or colloidal substances that prevent the transmission of light in the water, affecting local flora and fauna (Ustaoğlu et al., 2020a, b). The median of the turbidity value (12.65 NTU) was within the legal limit (100 NTU), and the maximum value observed exceeded the limit imposed in the resolution. However, the 75% percentile (26.34 NTU) shows that more than 75% of the values are below the standard. According to Oliveira et al. (2021), all turbidity values that violated national legislation in the HPP Irapé region were observed in lotic environments. This lack of violation in the reservoir can be explained by the sedimentation of particles common in environments with less water flow, considering that the present study analyzed only the surface waters of the lake. Lower turbidity values in lentic environments were also observed by Wu et al. (2016) and Li et al. (2019).

In Brazil, one of the main problems related to the deterioration of water quality is the contamination of water resources by domestic sewage. This is because only 62.8% of the country’s municipalities have sewage treatment plants in operation (IBGE, 2020). Thermotolerant coliforms are among the main bacteria used as indicators of fecal pollution in water. However, the median values of this parameter in the HPP Irapé region (30 MPN 100 mL−1) was below the limit (< 1000 MPN 100 mL−1) imposed by the national resolution. Although the maximum observed value (15.000 MPN 100 mL−1) exceeds the limit, at least 75% of the values are below the standard, since the 75% percentile was 350 MPN 100 mL−1. These results can be explained due to the favorable flow rates and conditions of regeneration, promoted by the slopes of the river beds in the study watershed, allowing the reduction of the impact caused by the release of domestic sewage from a relatively small and sparse urban agglomeration (Igam, 2014). Another point to be mentioned is the value of the standard deviation calculated for thermotolerant coliforms (1500.90 MPN 100 mL−1), which shows a great spatial and/or temporal variation of the individual values obtained. In fact, Oliveira et al. (2021) identified a significant influence of seasonality on the thermotolerant coliform data from HPP Irapé, mainly in lotic environments, which showed higher values in rainy periods.

The median and the 75% percentile of the NO3 (0.14 mg N L−1 and 0.25 mg N L−1), SO4−2 (1.60 mg P L−1 and 4.68 mg P L−1), TDS (26.50 mg L−1, and 52.5 mg L−1), and TAN (0.06 mg N L−1 and 0.10 mg N L−1) demonstrated that most of the analyzed values are in accordance with national legislation (10 mg N L−1, 250 mg P L−1, 500 mg L−1, and 0.50 mg N L−1, respectively), thus not contributing to the degradation of the region’s water quality. However, the TAN in the Irapé reservoir have shown an increasing trend considering the same period of the present study, which may be associated with a contribution from agricultural sources in the watershed (Oliveira et al., 2021). Therefore, preventive measures must be taken since high TAN cause toxicity in aquatic life.

Cluster analysis

The dendrogram formed by the CA is shown in Fig. 2, while the monitoring stations highlighted according to the groups formed in the CA is presented in Fig. 3.

Fig. 2
figure 2

Dendrogram of the CA (Ward’s method) from the Irapé HPP water quality monitoring data, considering 14 water quality parameters in 14 monitoring stations from 2008 to 2018

Fig. 3
figure 3

Monitoring stations highlighted by CA formed groups

The cophenetic correlation coefficient was 0.75, indicating that the applied cluster method (Ward) was adequate, with a good adjustment and low distortions in the measurements between the original and the generated matrices (Rohlf, 1970). Through the connection values between the groups (Table 2), it was possible to observe that the highest jump (4.83) occurred between the grouping steps 10 and 11, which correspond to the connection distances 27.08 and 31.91. The Fenon line (cut line) was drawn between these two distances, with an average value of 29.5 in the dendrogram, as proposed by Vicini (2005). It is also observed that only after the 10th step of the algorithm did the lotic and lentic environment stations begin to group, clearly showing a distinct behavior between these two environments.

Table 2 Values of the connection distances (CD) in which the groups were formed throughout the cluster analysis steps

Therefore, the formation of four clusters was verified: Cluster I (C1), Cluster II (C2), and the stations VIR95-LO (C3), and VIR60-LE (C4).

The C1 is formed by four stations located in the Irapé HPP reservoir (VIR50-LE, VIR40-LE, VIR30-LE, and VIR20-LE) and three in the lotic environments (VIR70-LO, VIR115-LO, and VIR09-LO). However, the stations VIR70-LO and VIR115-LO, although in a lotic environment, are located immediately downstream of the reservoir. VIR09-LO is the station closest to the reservoir and may therefore present features closer to it. A similar result was observed by Christofaro et al. (2017). These results suggest that in addition to the type of environment, the proximity to the reservoir was also responsible for the water quality change.

In C2, all grouped stations (VIR08-LO, VIR10-LO, VIR11-LO, VIR06-LO, and VIR03-LO) are in lotic environments, the latter being the only one on the main river (Jequitinhonha) upstream of the reservoir. The other stations in this cluster are in the affluent water bodies of the Jequitinhonha (Itacambiruçu, Soberbo, Noruega, and Corrente rivers), tributaries of the reservoir. Therefore, all these points represent the natural conditions of the water courses, without the dam effects.

The other stations analyzed in the study, C3 (VIR95-LO) and C4 (VIR60-LE), remained isolated. The VIR95-LO station, located in the city of Coronel Murta, is the most distant station downstream of the dam and is the only station that receives contributions from the Vacaria River, an important tributary of the Jequitinhonha river. VIR60-LE is the lentic station closest to the dam and may be suffering the effect of the mixture associated with the water intake of the reservoir. This factor may explain its isolation from other reservoir stations.

The results demonstrate that lotic and lentic environments, as well as the distance from the dam, influence water quality, which was also verified in other studies in reservoirs (Toklati et al., 2015; Trindade et al., 2017).

Nonparametric statistics

The normality test identified that most of the analyzed data do not follow the normal distribution, justifying the use of non-parametric tests in the present study.

The KW test showed significant differences (p < 0.05) between the clusters in relation to the 7 values for the 14 parameters analyzed in this study, namely, thermotolerant coliforms, DFe, TP, DO, TDS, WT, and turbidity. Following the multiple comparison test for the identified cases, it was possible to observe between which groups/stations these significant differences occurred. Despite providing the significant differences between the groups, the tables generated in this test do not inform which groups/stations presented better or worse water quality in relation to each analyzed parameter. In this sense, boxplots of each parameter were plotted (Fig. S1) to allow the parameter variation visualization in the CA for each group formed and to help to interpret the multiple comparison test results. The results after compilation between the multiple comparison tests and generated boxplots for all parameters that showed a significant difference presented in Table 3.

Table 3 Final results after compilation between the multiple comparison tests and generated boxplots for all parameters that showed a significant difference (p < 0.05 in the KW test) between the CA groupings formed

Table 3 shows that the thermotolerant coliforms have significantly lower values in C4 and C1 when compared to C3 (located in the principal river) and C2. The lower values of coliforms in lentic and nearby environments may be associated with significantly higher temperatures observed in C1 (in relation to C2 and C3) and in C4 (in relation to C2), since bacteria are highly sensitive to solar intensity and temperature increases (Cho et al., 2012). In turn, the temperature difference between these environments can be justified by the common stratification process in reservoirs, resulting in a more heated surface layer (Elçi, 2008; Tao et al., 2020). Tao et al. (2020) identified that the construction of the Three Gorges reservoir in China had a greater impact on the water temperature of the Yangtze river than natural changes in air temperature. Kędra and Wiejaczka (2018) also observed changes in the thermal regime of the Dunajec river after the construction of artificial reservoirs along the river.

The DFe showed a behavior similar to coliforms, with significantly lower concentrations in stations with or near lentic characteristics. This result may have been found because the present study analyzed only the surface waters of the study water bodies. This is because of the concentrations of iron in the reservoir water, which are directly linked to the leaching of the metal in sediments due to the local geochemical characteristics, since the region is categorized by red latosols containing a high content of Fe2O3 (Cemig, 2016). Therefore, the lower water flow in a lentic environment causes the deposition of contaminated sediments at the reservoir bottom, which may influence the higher concentrations of iron, in its various forms, at this depth. This result is in accordance with those obtained by Christofaro et al. (2017), who found higher concentrations of metal in lotic stations and in bottom stations when analyzing the water quality of Nova Ponte reservoir in Brazil and its tributaries.

The TP concentrations differed significantly only between the groups C1 and C2, proving to be lower in the first group. As observed for the DFe, the lower concentrations in C1 may be associated with exclusively superficial analyses of the water bodies. This is because the reservoirs tend to function as a kind of phosphorus “sink,” retaining the nutrient in their sediments, by adsorption (Tang et al., 2019; Wu et al., 2016). The lower water flow in this type of environment tends to increase the deposition of the sediments at the bottom of the water body, with consequent phosphorus deposition. This phenomenon was observed by Tang et al. (2019) at the Three Gorges reservoir, China, where excessive phosphorus loading was identified as an urgent problem during the post-reservoir period. The authors also observed that despite the existence of this problem, no obvious phosphorus enrichment occurred in the upper 20 cm of the water column.

The DO values in C1 were significantly lower than the other groups. These results may be associated with a greater biological activity in reservoirs that can lead to greater oxygen consumption, explaining the lower DO values in these environments (Hauer & Hill, 2007).

The C3 showed TDS concentrations significantly higher than C1 and C4, as well as turbidity significantly higher than the other clusters, which may explain its isolation compared to other stations located in lotic environments.

C1 shows great similarity to C4, indicating only significant lower DO concentration. Thus, the higher concentrations of DO may explain the separation of C4 from the other stations located in the reservoir.

The clusters inside the reservoir and nearby (C1 and C4) showed lower values of thermotolerant coliforms, DFe, TP, TDS, and turbidity, suggesting a better water quality of the stations belonging to these clusters, as compared to the stations of their tributaries (C2). Xin et al. (2015) found similar results when comparing the water quality of the Danjiangkou reservoir, in China, with its tributaries. The results of this study may indicate the influence of the reduction in water flow speed with consequent deposition of metals, nutrients, and particles by sedimentation, as well as the effects of the circulation and distribution of pollutants due to vertical stratification, a common reservoir phenomenon (Christofaro et al., 2017; Li et al., 2019; Wang et al., 2009).

Spearman correlation

The correlation between the parameters that differ in the KW test within each cluster is shown in Table 4. It is noted that all the relationships significantly existing in C2 and C3, both groups with stations in a lotic environment, were positive. Turbidity pointed out a significant correlation with thermotolerant coliforms (r = 0.255 and r = 0.577), DFe (r = 0.391 and r = 0.578), and TP (r = 0.323 and r = 0.630) in these clusters, in addition to a significant correlation with WT (r = 0.312) in C2 and with TDS (r = 0.414) in C3. It must be observed that these parameters were precisely those in which statistically significant differences were identified between the groups formed by lentic stations or close to the reservoir and those formed by lotic stations, with higher concentrations (Table 3).

Table 4 Spearman correlation results between the parameters that differ in the KW test within each cluster

A common factor that can explain all these relationships is the contribution from rural sources. It is noteworthy that the watershed under study is one that contains the highest percentage of rural population in the country, with agriculture and livestock being the predominant activity in the region (Oliveira, 2021). These activities are associated with an increase in soil erodibility, with sediment input to the bed of water bodies that directly affects its turbidity and TDS. The particles rich in Fe, due to the local geochemical characteristics, and in TP, due to the application of fertilizers in the crops and excrements produced in livestock, promote the increase of these elements in the rivers of the region.

The increase in DFe concentrations in water due to soil composition has also been identified in other studies (Christofaro et al., 2017; Kändler et al., 2017), as well as the increase in TP due to agricultural activities (Hajigholizadeh & Melesse, 2017; Aydin et al., 2020; Ustaoğlu et al., 2020; Varol, 2020). Excrement can also justify the results generated for thermotolerant coliforms (Cho et al., 2012), since these bacteria are present exclusively in the intestines of hemeothermic animals, being one of the main indicators of fecal pollution (Lugo et al., 2021). In fact, this parameter showed a significant and direct correlation with TP, in both groups (r = 0.393 and r = 0.396), corroborating the association of both variables with contamination by animal feces from livestock.

In clusters formed by stations located in the reservoir and close to it, turbidity showed less interference in water quality, correlating with few variables. In C1, the variable was significantly related only to DFe (r = 0.433). These results are consistent with the previously discussion, justified by the sedimentation process in the reservoir due to the lower water flow in this environment (Li et al., 2019; Wang et al., 2009).

In C4, the thermotolerant coliforms parameter showed a significant and inverse correlation with DFe (r = −0.522) and turbidity (r = −0.482), indicating that the water quality in VIR60-LE does not suffer as much interference from livestock, being only related to soil erodibility. In fact, it is noted that TP did not present a significant correlation with this variable. Another possible explanation is the effects of pollutants circulation and distribution due to vertical stratification, a common process mainly in large reservoirs, may also have influenced the result (Li et al., 2019; Wang et al., 2009), since the present study analyzed only the surface waters of the lake.

Principal component analysis

The loads graphic of the first two factors, obtained after varimax rotation, using the Spearman correlation method is shown in Fig. 4. It is observed that the first two components generated from the PCA explain 54.58% of the total data variability (30.54% by PC1 and 24.05% by PC2).

Fig. 4
figure 4

Loads of the first two main components extracted after varimax rotation, for the water quality database of the HPP Irapé region

The PCA applied in this study contribute for the differentiated behavior observed between stations located in a lotic environment and those located in a lentic environment, as previously mentioned. Note that all reservoir stations (VIR20-LE, VIR30-LE, VIR40-LE, VIR50-LE, and VIR60-LE) are on the left in the graphic, in the 2nd and 3rd quadrants, while those located in the main river (VIR03-LO, VIR70-LO, VIR115-LO and VIR95-LO) and its tributaries (VIR06-LO, VIR08-LO, VIR09-LO, VIR10-LO and VIR11-LO) are located on the right, 1st and 4th quadrants.

Regarding the analyzed parameters, the water temperature is the most different from the others (Fig. 4). This variable is more associated with the stations located inside the reservoir, in which they present, in general, higher averages, as observed in Table S1. The other parameters explain more intensely the variability within the stations located in lotic environments. In particular, the parameters DFe, turbidity, and thermotolerant coliforms stand out, whose weight vectors were closer to 1 in relation to the PC1 axis (Fig. 4) and for which the calculated means and standard deviations were, in the overall, visibly higher in the main river and its tributaries (Table S1).

The analysis of the angles formed between the weight vectors of each variable and the axis of the components indicate that the parameters most associated with PC1 may be related to agricultural activities, such as thermotolerant coliforms, turbidity and TAN, and with natural factors, such as temperature and DFe, according to local geochemical characteristics. These results corroborate to the analyses applied previously. For PC2, the variables of greatest weight seem to be related to the acidity of the region's water (talc, pH, and SO4−2), explained by the soil characteristics and the composition of the rocks used in the construction of the HPP Irapé dam (Cemig, 2012).

Conclusion

The descriptive statistics pointed out that, in general, the water quality of the HPP Irapé reservoir and its contributors is adequate for their uses, except for DFe.

The CA separated the stations into four clusters, two consisting of isolated monitoring stations and two of larger groups. Cluster I (C1) consisted of stations with greater influence from lentic environments and Cluster II (C2) consisted of stations with more influence from lotic environments, located upstream of the reservoir. These results demonstrate the reservoir influence on the regional water quality. The distinct behavior between the lentic and lotic environments was also evident in the PCA.

The analyses employed indicated a better water quality in the stations in the reservoir. This can be associated with the sedimentation process of the pollutants due to the slower water flow rate and vertical stratification. However, this study analyzed only the reservoir surface waters, due to the absence of data in the other depths. Thus, it is suggested that future works explore and consider the entire water column of the reservoir to complement this study. Another suggestion is the inclusion of the biological data evaluation and a greater exploration of data referring to metals, given the low pH values found in the region and the great impact that these components have in aquatic biota.

Finally, Spearman’s correlation analysis and PCA indicated that the main variables responsible for altering the quality of the region can be associated with natural factors, such as local geological characteristics, as well as with agricultural and livestock activities in the region.