1 Introduction

Influenced by the new standards of modern life, as well as the ever-increasing population growth rate, the demand for water has been increasing considerably. All these factors, and others, such as the inappropriate disposal of solid waste and industrial and urban effluents, end up influencing a worrying increase in the pollution of water resources, bordering on irreversibility in more urbanized areas (Souza et al., 2014). In addition, pesticides and residues from agriculture activities have been affected the water quality since decades (KUMAR, 2015; SILVA et al., 2017). It is also worth noting that discharges of domestic and industrial effluents without any treatment into water bodies are a major concern in aquatic environments (Santos et al., 2018).

Currently, the population faces a series of environmental, economic, and social problems, making concerns about the environment and water quality acquire great importance Oliveira and Silva (2014). Thus, it demonstrates the importance that the monitoring of the quality of waters bodies has for the knowledge of their punctual pollution sites as well as for the identification of the group of pollutants responsible for such environmental imbalance Pimenta et al. (2016).

Although extremely important, accurate assessment of the type and extent of water pollution is a difficult and complicated task (Huang et al., 2010). In this context, one of the problems faced in monitoring water quality is the complexity of analyzing a large number of variables (Iscen et al., 2008). When water resources are monitored for environmental purposes, indexes are commonly used to summarize the amount of data in a numerical form (Oliveira et al., 2019).

In this scenario, the Water Quality Index (WQI), developed by the National Sanitation Foundation (NSF) in 1970 in the USA, aims to compare and monitor the qualitative changes in water resources Ferreira et al. (2015). The calculation of this index consists of the use of variables that reflect the changes occurred in the region of the micro basin of the water resource, be they anthropic or natural. Still, the WQI has the advantage of resulting in a value that is easy to communicate and understand, since in the WQI the interactions and changes in the analyzed parameters, having different units of measurement, result in a single number Centeno & Cecconello (2016). In Brazil, this index was adapted by the São Paulo Environmental Sanitation Technology Company (CETESB) in 1975 in order to better fit Brazilian water bodies, where, from the parameters defined by NSF, nitrate was replaced by total nitrogen and the total phosphate parameter was replaced by total phosphorus, maintaining the same quality weights and curves Agência Nacional and de Águas e Saneamento Básico (ANA) (2005).

Another important tool that can be used very well for monitoring water resources is the use of statistical methods. These methods can provide a simple and objective interpretation from a complex sample of data, which helps in understanding the effects of natural or anthropic processes that occur throughout the water body Hair et al. (2009). The application of these methods can, in addition to being more accurate, facilitate the manipulation of these data (Zhao et al., 2012). Based on this same idea, Zhao et al. (2011) approaches that the application of advanced statistical methods is of great use to extract meaningful information without losing its accuracy.

The statistical methods that can be used for these purposes are very diverse and cover different types of approaches. Although when working with the similarity between variables different measures have been proposed, the correlation coefficients and distance coefficients are the most commonly used Bufon & Landim (2007). Correlation methods can provide a good idea of how the parameters used to assess the quality of water resources correlate with each other. There are different types of correlation, as well as different coefficients to be used, some assuming the normality of the sample distribution, others not. For samples that follow a normal distribution, it is common to use Pearson’s coefficient, while Spearman’s coefficient is widely used for data sets that do not have sample normality (Guimarães, 2017).

Recently, several studies have been conducted based on statistical analyses in order to assess water quality and to identify how the parameters are correlated. Authors such as Valentini et al. (2021a), Silveira et al. (2021), and Valentini et al. (2020) used statistical techniques such as principal component analysis (PCA), correlation analysis, and variance analysis to assess the water quality parameters of the Mirim Lagoon Hydrographic Basin in the extreme south of Brazil. Numerous studies have used these and several other statistical analyses for qualitative assessment and monitoring of water resources (Ewaid et al., 2018; Kangabam et al., 2017; Oliveira et al., 2019; Santos et al., 2020b; Valentini et al., 2021b; Vieira et al., 2019; Wu et al., 2018).

The study conducted by Valentini et al. (2021a), for example, evaluated the annual averages of the parameters used to calculate the WQI in the Mirim Lagoon Hydrographic Basin, finding strong and significant correlations between parameters such as phosphorus and total solids, BOD and dissolved oxygen, turbidity and total solids, among others. Santos et al. (2020b), in turn, evaluated the water quality of the Moreira-Fragata Stream, a water resource affluent of the aforementioned basin, and found strong and significant correlations between parameters such as temperature and conductivity, thermotolerant coliforms and nitrogen, thermotolerant coliforms and phosphorus, and more. Both studies used the statistical analysis of correlation to help understand the processes that could be causing pollution in the water resources analyzed and concluded that this technique, in fact, is an extremely useful alternative for this purpose.

Furthermore, following this same line of study, Valentini et al. (2021b) used the correlation analysis to demonstrate that, with regard to the annual averages of the analyzed parameters, there was a divergence between which parameters actually contributed with greater intensity to the final result of the WQI, when compared with the relative weights defined by CETESB. Therefore, in their study also conducted in the Mirim Lagoon, Valentini et al. (2021b) used the technique of multiple regression analysis to generate a new WQI equation for this basin, using only those parameters that had strong and significant correlations with the result of the WQI adapted in Brazil by CETESB. Moreover, studies such as the one conducted by Cassalho et al. (2019) used the correlation analysis to investigate which geomorphological and climatic variables most significantly influenced the maximum streamflows in the state of Rio Grande do Sul, Brazil, and later also used a multiple regression analysis to model the regional equations of maximum streamflows for that state.

The Mirim Lagoon watershed is one of the main transboundary basins in South America (Piedras et al., 2012). Located between the extreme south of Brazil and the north of Uruguay, Mirim Lagoon is one of the most important freshwater lagoons in the Brazilian territory (Oliveira et al., 2015). With regard to the pollution levels of Mirim Lagoon, having a large part of the surrounding areas used for agricultural activity Oliveira et al. (2015), Mirim Lagoon presents several sources of pollution arising from this activity, as well as the activities of the populations living surrounding these areas. However, these focuses can also be attributed to the release of industrial and domestic effluents, in nature, in the São Gonçalo channel Souza et al. (2013). Other representative factors in the variations in the quality and in the water levels of Mirim Lagoon may be linked to climatic phenomena such as El Niño Hirata et al. (2010). Consequently, the conservation of the water quality in this basin is highly necessary, as well as the use of reliable techniques for monitoring the water quality of this lagoon.

Therefore, the objective of this study is to analyze, through statistical methods, the influence of quality parameters on the WQI of Mirim Lagoon and to evaluate the impact that these parameters have on the quality of these waters, as well as to identify which of these parameters are more representative for the quality of this water body.

2 Methodology

2.1 Description of the Study Area

This study was developed addressing the water body of Mirim Lagoon, located on the coastal plain of Rio Grande do Sul in the extreme south of Brazil. This water resource has approximately 375 thousand hectares of water surface, constituting the second largest water body with lake characteristics in the country, connecting Lagoa dos Patos through the São Gonçalo channel Oliveira et al. (2015).

The Mirim Lagoon Hydrographic Basin, in addition to aspects related to aquatic biodiversity, presents itself as an important historical scenario, with regard to the international border with Uruguay (Steinke & Saito, 2008). The main use of its waters is through direct extraction for agricultural use in rice fields in Brazilian and Uruguayan territory (IRGA, 2006). In addition, the waters of this lagoon are also destined, after treatment, for human consumption by the populations that surround their areas, mainly for the municipalities of Pelotas and Rio Grande (SEMA, 2006).

2.2 Collections, Monitoring Points and Laboratory Analysis

For this study, 22 samples campaigns were carried out over the years 2015, 2016, and 2017 at 7 monitoring points, as described in Table 1. These points comprise the entire length of Mirim Lagoon in Brazilian territory, as shown in Fig. 1.

Table 1 Description and coordinates of the collection points
Fig. 1
figure 1

Location of the Lagoa Mirim Hydrographic Basin and its monitoring points

All samples collected at all monitoring points were safely stored and sent to the water and effluent laboratory of the Mirim Lagoon Watershed Development Agency (MLA), where the physical, chemical, and microbiological analyses necessary for the calculation were carried out of WQI. All analyses were performed according to the methodology described by APHA (2017), with only the temperature made in loco using a digital thermometer. The results of these analyses were taken from the MLA website, which were publicly accessible during the period covered in this study.

2.3 Water Quality Index (WQI)

The WQI is calculated using the weighted product of each of its parameters, as shown in Eq. (1). This calculation results in a number from 0 to 100, with the upper end considered to be of better quality.

$$IQA=\prod_{i=1}^{n}{qi}^{wi}$$
(1)

where qi is the quality of the i-th parameter, a number between 0 and 100, which is obtained through the quality graph, defined by NSF, as a function of the measured concentration, and wi is the weight of the i-th parameter represented by a number between 0 and 1 (ANA, 2005).

To perform the WQI calculation, 9 analyses are made, according to the adaptation made by CETESB of the parameters defined by the NSF, shown in Table 2 together with their relative weights, stipulated for the global conformation of water quality. The results of these analyses were used to calculate the WQI for each of the monitoring points in Mirim Lagoon for all the months monitored during the 3 years of study.

Table 2 Relative weights of the parameters for calculating the WQI

The results obtained by calculating the WQI are represented in quality bands, as shown in Table 3. These bands differ between the different states of the national territory, being adopted in this study the band related to the state of Rio Grande do Sul, where it is located the Mirim Lagoon.

Table 3 WQI classification for the state of Rio Grande do Sul

2.4 Statistical Analysis

For the statistical analyses, the results of the laboratory analyses and the calculation of the WQI of each monitoring point were analyzed separately, in order to assess the correlation of the WQI of each point with the concentrations of the parameters that define this index. All statistical analyses were performed using the free Excel ActionStat software.

First, normality tests were performed for the data at each monitoring point, in order to assess the sample distribution to define which correlation coefficient should be used. If these data obtain a normal distribution, Pearson’s coefficient is used, if not, Spearman’s coefficient is used (Guimarães, 2017). The normality test used was the Kolmogorov–Smirnov (K-S) test with a 95% confidence interval and a significance level of 0.05.

Before proceeding with the normality test, data standardization was carried out to suppress differences in the orders of magnitude of the results of laboratory analyses so that there is no interference with statistical analyses. This standardization was done according to the methodology proposed by Wilks (2006). Once the sample distribution has been evaluated, we then proceed to the correlation matrix itself. This type of analysis, according to Reis (1997), allows the identification of subsets of variables that are highly correlated with each other. We proceed with the analysis of this correlation matrix in order to verify the relationship between the variables studied with the final WQI result of each monitoring point in Mirim Lagoon.

3 Results and Discussion

3.1 Statistical Analysis—Correlations

According to the methodology used, the sample distribution was first evaluated using the Kolmogorov–Smirnov (K-S) test, which tests the normality of the data. For this test, in no monitoring point did all parameters evaluated within a normal distribution be obtained, which makes it impossible to use a Pearson correlation coefficient. Valentini et al. (2021a), also in a study concerning Mirim Lagoon, used the annual averages of the studied parameters to perform correlation analysis between them, considering all monitoring points together. For these annual averages, a sample distribution was confirmed as normal, which possibility the use of correlation matrix with Pearson coefficient.

Meantime is worth to noted that the correlations between parameters and final result of WQI has being evaluated for each monitoring points separately. In this analysis, as previously mentioned, there was no normal distribution for all the parameters. Therefore, a non-parametric coefficient should be used for the following correlation analyses. Santos et al. (2020a), in a study carried out in the Moreira/Fragata stream in the municipality of Pelotas, also obtained results showing parameters which did not follow a normal distribution. Therefore, non-parametric Spearman’s coefficient was used to perform its correlation matrix. Thus, regarding the analysis of the parameter-WQI correlation by monitoring point, this was performed using the Spearman’s coefficient.

Although each parameter has its numerical influence factor in the calculation of the WQI, each one of them can occur with correlations of different intensity with the final result of this index depending on the factors that influence the pollution of the water body. The results of these correlations are shown in Table 4.

Table 4 Parameter-WQI correlation coefficient per monitoring point

Those correlations that obtained a coefficient greater than 0.5 in module are considered strong and also, correlations considered to be great are those that have the coefficient (r) value closest to |1.0| Helena et al., 2000; Levine et al., 2013). The issue of analyzing the correlation module is stressed, as negative correlations can also be strong. Generally speaking, when two variables have a strong positive correlation with each other, their concentrations vary in the same direction, while variables with a strong negative correlation have the opposite behavior (Drose et al., 2020; Valentini et al., 2020).

As can be seen in Table 4, the thermotolerant coliforms parameter (C.Therm) was the parameter with the highest correlation with the final WQI result at points ML1, ML2, ML3, and ML7, while phosphorus was the parameter with the highest coefficient of parameter-WQI correlation at points ML4 and ML5. Dissolved oxygen (DO), despite being the parameter with the highest relative weight within the calculation of the AAI, as shown in Table 2, had a greater correlation with the WQI only at the ML6 point.

The fact that the thermotolerant coliforms parameter is the one with the highest correlation with water quality at points ML1, ML2, ML3, and ML7 may be associated with the proximity of these monitoring points to urban areas, with points ML1, ML2, and ML3 close to cities from Arroio Grande and Jaguarão and the ML7 point located in the port of Santa Vitória do Palmar. Corroborating this, several other studies carried out in the Mirim Lagoon and other water body near to urban areas of that region reports a pollution arising from domestic sewage as a possible cause of thermotolerant coliforms’ presence.

Analyzing the monitoring points separately, we see that at points ML1 and ML3, only the total coliform parameter obtained a strong correlation with the final result of the WQI. The ML2 point, on the other hand, had a strong parameter-WQI correlation with the variables C.Therm and DO, although the C.Therm-WQI correlation is the strongest for this point. The proximity of the monitoring points ML1, ML2, ML3, and ML7 can be seen in Fig. 2, focusing more on the proximity of points ML3 and ML7, close to the cities of Jaguarão and Santa Vitória do Palmar, respectively.

Fig. 2
figure 2

Location of urban spots near the monitoring points of Mirim Lagoon

As for the ML1 point, it is near to a bathing region, the Pontal Beach, which has a small village belonging to the municipality of Arroio Grande, RS. According to Valentini et al. (2021c), the proximity of bathing regions can contribute to greater organic pollution from domestic effluents, resulting in a high concentration of coliforms, which reduces the WQI. Also, according to Agência Nacional and de Águas e Saneamento Básico (ANA) (2020), in the municipality of Arroio Grande, where this small village is located, there is no treatment of the sewage generated by the population, which certainly contributes to a higher impact related to organic pollution.

As for ML4 and ML5, that points have the parameter phosphorus as the most representative for theirs WQI. This can be linked to these points being more distant from areas of dense urbanization and closer to areas of intense agricultural activity. The association of phosphorus presence in water with the agricultural areas proximity is since phosphorus is one of the main components of agricultural additives used in crops of the region and may be carried by surface runoff toward to water body (Drose et al., 2020; Rocha and Pereira, 2016. Nevertheless, several studies have associated the phosphorus presence in water from agricultural activities pollution, having as well consolidated this association in several researches regarding water quality (Guedes et al., 2012; Santos et al., 2020b, 2021; Valentini et al., 2020, 2021d).

Also, at point ML4, we have strong correlations parameter-WQI for the variables thermotolerant coliforms and total solids, in addition on the aforementioned phosphorus WQI correlation. At point ML5, in addition to the correlations found at point ML4, parameter-WQI correlations were also found for DO and turbidity (Turb). Both points ML4 and ML5, as already discussed above, are in areas that are more influenced by agricultural activities, the rice-growing activity being the main responsible for the economic development of the region where these points are located (Steinke, 2007; Steinke & Saito, 2008.

The WQI of the ML6 point, the only point where the DO was the most representative parameter for the final result of the WQI, has a strong correlation with the parameters phosphorus, thermotolerant coliforms, DO, and ST. The strong parameter-WQI correlation with these variables indicates that the quality of its waters those point is strongly influenced by agricultural and urban activities as well.

Drose et al. (2020) and Valentini et al. (2020), both in studies conducted in Mirim Lagoon, used principal component analysis method, and found the variables phosphorus, turbidity, and total solids grouped into the same principal component and they associated this component with pollution from agricultural activities developed in the surrounding of this water resource. Even, as for the variables dissolved oxygen and thermotolerant coliforms, it might be associated both to organic pollution from domestic sewage and agricultural pollution (Santos et al., 2020b; Valentini et al., 2021a).

4 Conclusion

Through the statistical methods of correlation used in this study, it was possible to demonstrate which parameters of quality monitoring have the greatest influence on the final result of the WQI at each monitoring point in Mirim Lagoon. Still, through the parameter-WQI correlation analysis, it is concluded that the relative weights stipulated by the NSF for calculating the WQI, adapted in Brazil by CETESB, do not represent in the best way the conformation of the water quality of Mirim Lagoon, which it may indicate that these generalized weights are not suitable for all Brazilian water bodies.

As demonstrated by the correlation analysis, although there are relative weights of influence of each parameter within the calculation of the WQI, the parameters may interfere with greater or lesser intensity in the final result of the WQI depending on the factors responsible for the pollution of the site. This can be exemplified by observing, among others, the correlations of the ML4 point that showed a higher coefficient between WQI-phosphorus than between WQI-DO or WQI-thermotolerant coliforms, even the last two being the ones with higher relative weights, defined by NSF, within the calculation of the WQI.

With this, it is possible to conclude, also, the importance of using statistical methods such as correlation to monitor water quality, as these allow well-supported inferences about the monitoring data used. Moreover, it is worth noting that the findings of this study, despite of referring only to the Mirim Lagoon, may be relevant to other water resources that suffer interference from the same anthropic activities reported here. Therefore, it is possible that studies conducted on water resources in other locations also find parameters that have greater or lesser influence in the final result of the WQI, despite the relative weights defined for each parameter by CETESB. This corroborates the need for studies on quality indexes that take into account factors such as, for example, agricultural and industrial activities and the use and occupation of land in the regions surrounding the analyzed water bodies, in order to better define the weights of influence of each parameter on the final index.