Introduction

Water quality encompasses the physical, chemical, and biological characteristics of the water bodies and can be altered by natural or anthropogenic processes (Grosbois et al. 2001; Ko et al. 2009; Hamid et al. 2020). Evaluating quality is crucial since surface freshwaters (e.g., rivers and streams) provide resources for several human activities (e.g., irrigation, navigation, energy generation, and water supply) with contrasting water quality requirements. Natural water quality conditions in aquatic systems are driven by hydrological and non-hydrological processes such as atmospheric deposition, surface runoff, particle dissolution and sedimentation, weathering of rocks, biological uptake, sorption, and desorption (Meybeck and Helmer 1989; Stream Solute Workshop 1990; Akhtar et al. 2021). Thus, water quality can exhibit spatial and temporal patterns depending on the interactions among these processes within riverine networks. Additionally, anthropogenic activities such as point (e.g., wastewater discharges) and nonpoint sources of pollution (e.g., nutrients from agricultural areas) (Meybeck 2003; Hamid et al. 2020; Akhtar et al. 2021) can influence spatial and temporal patterns of water quality. There is still modest understanding of the relationship between landscape characteristics (e.g., land use, geology, hydrology) and the associated responses (variation) in the water quality (Shi et al. 2017; Li et al. 2018; Lintern et al. 2018; Rodrigues et al. 2018; Yadav et al. 2019; Aalipour et al. 2022). These features and their complex interactions make water quality characterization a challenging task in rivers and streams (Sanders et al. 1983; de Bastos et al. 2021). Consequently, well-structured monitoring programs are essential for supporting the water resources management initiatives with representative data.

Water quality monitoring networks (WQMNs) were first established in the late 1960s to describe the general water quality status (Harmancioglu et al. 1998; Strobl and Robillard 2008). Planning these WQMNs has challenged water resources managers attempting to balance technical (e.g., spatial and temporal representativeness) and administrative aspects (e.g., human and financial resources, logistics, legal requirements) (Behmel et al. 2016; Nguyen et al. 2019; Jiang et al. 2020). Additionally, more complexity is attached to network design due to the absence of a single method suitable for all watersheds, as each one has specific natural and/or anthropogenic features (e.g., geology, hydrology, land uses, sources of pollution) that possibly demand customized approaches (Behmel et al. 2016).

The WQMN planning in many countries up to the 1990s was frequently based on subjective professional judgment and on logistics for determining the temporal and spatial elements of the WQMNs (Strobl and Robillard 2008; Mavukkandy et al. 2014; Nguyen et al. 2019). Introducing new monitoring sites and updating water quality parameters to comply with changes in legislation are relatively common in WQMNs. However, reassessing the initial design and goals was historically neglected (Harmancioglu et al. 1998) since changes in the monitoring programs were treated as obstacles to evaluating long-term trends in water quality (Strobl and Robillard 2008). Now, WQMN reassessment is a crucial strategy for (1) identifying deficiencies in the current WQMNs (e.g., insufficient or unreliable data to meet the monitoring goals) (Harmancioglu et al. 1998; Strobl and Robillard 2008), (2) addressing emerging water quality problems (Yudina et al. 2021), (3) incorporating new monitoring technologies for improving WQMNs’ efficiency (Jiang et al. 2020), and (4) updating the monitoring goals (Harmancioglu et al. 1998; Strobl and Robillard 2008).

High monitoring costs associated with insufficient data for responding to the specific water quality problems (e.g., eutrophication, acidification, and metal contamination) evidenced the inefficiency of past strategies (Harmancioglu et al. 1998; Camara et al. 2020). Numerous studies assessed the existing WQMNs to address these deficiencies (Mei et al. 2011; Chen et al. 2012; Guigues et al. 2013; Mavukkandy et al. 2014; Behmel et al. 2019; Camara et al. 2020; Asadollahfardi et al. 2021; Varekar et al. 2021), including the revision of monitoring sites’ locations, sampling frequency, and water quality parameters measured (Nguyen et al. 2019; Jiang et al. 2020). Specifically for the WQMN sampling frequency, the most common techniques are confidence intervals (Khalil et al. 2014), entropy (Mahjouri and Kerachian 2011), trend analysis (Naddeo et al. 2007, 2013; Scannapieco et al. 2012), analysis of variance (Guigues et al. 2013), multivariate statistical analysis (Calazans et al. 2018a; Peña-Guzmán et al. 2019), analytic hierarchy process (Do et al. 2013), water quality modeling (Vilmin et al. 2018), and machine learning techniques (Singh et al. 2011; Scannapieco et al. 2012; Ji et al. 2022).

There is no scientific consensus about a single method for defining the WQMN sampling frequency but the minimum frequency that provides reliable estimates of the water quality indicators is a goal of any WQMN program. Appropriate frequency avoids redundant or insufficient data (Strobl and Robillard 2008; Liu et al. 2014) and depends on both parameters of interest and sampling location (László et al. 2007; Khalil et al. 2014; Vilmin et al. 2018; Coraggio et al. 2022) as well as the planned monitoring goals (Liu et al. 2014; Vilmin et al. 2018). Higher sampling frequencies are needed for more variable quality parameters (László et al. 2007; Strobl and Robillard 2008; Khalil et al. 2014; Coraggio et al. 2022). Temporal variations in water quality can occur in the inter-annual (Vercruysse et al. 2017; Long et al. 2022; Mai et al. 2022), seasonal (Ouyang et al. 2006; Jeon et al. 2020; Ogwueleka and Christopher 2020), or even daily scales (Miltner 2010; Jones and Graziano 2013; Vilmin et al. 2018; Bega et al. 2021). Some commonly reported key drivers of temporal variation of water quality were streamflow conditions, water temperature, soil moisture (Guo et al. 2019), vegetation cover, surface runoff, rainfall (Liu et al. 2021), hydro-meteorological conditions, landscape disturbances, human activities (Vercruysse et al. 2017), and biogeochemical processes (Jones and Graziano 2013; Vilmin et al. 2018). Such variability is usually not spatially homogeneous across drainage networks and among water quality parameters (Shehane et al. 2005; Ouyang et al. 2006; Khatri and Tyagi 2015; Taka et al. 2016; Igwe et al. 2017; Simedo et al. 2018; Lei et al. 2021; Liu et al. 2021). Consequently, adaptive sampling frequencies can help to reduce costs and increase the networks’ spatio-temporal representativeness (László et al. 2007; Khalil et al. 2014; Loga et al. 2018; Vilmin et al. 2018; Coraggio et al. 2022). This flexible approach is already present in some WQMNs from Europe (Arle et al. 2016) and the USA (Riskin and Lee 2021), but it is still uncommon in several WQMNs worldwide, including the São Paulo State (Brazil). However, relative flexibility is observed for São Paulo State since some time and resource-consuming water quality parameters had a different sampling frequency than the more traditional ones.

Historically, spatial optimization of WQMNs received more attention than sampling frequency (Behmel et al. 2016; Nguyen et al. 2019; Jiang et al. 2020). However, the sampling frequency selection is crucial for planning well-structured WQMNs to ensure adequate representation of variance (László et al. 2007; Khalil et al. 2014; Liu et al. 2014; Vilmin et al. 2018; Piniewski et al. 2019; Thompson et al. 2021) while minimizing costs of the monitoring programs (Strobl and Robillard 2008; Scannapieco et al. 2012; Naddeo et al. 2013; CCME 2015). Such aspects are especially significant in developing countries where the financial resources for planning, operating, and maintaining the WQMNs are usually limited, but the anthropogenic pressures on water quality are intensifying (Capps et al. 2016; Ma et al. 2020). These challenges are also present in Brazil, where 49% of the sewage is untreated (Brasil 2021a) and represents one of the main causes of surface water pollution. Additionally, water pollution from agricultural (Cruz et al. 2019; de Oliveira et al. 2021; Fraga et al. 2021), mining (Soares et al. 2020; dos Santos Vergilio et al. 2021; Viana et al. 2021), and industrial activities (Martinelli et al. 2013; Silva et al. 2016; Tonhá et al. 2021) is also a concern.

Despite the several water quality challenges in Brazil, a legal framework to guide the water quality monitoring of aquatic systems is still lacking. However, especially after the 1980s, several standards were introduced into the Brazilian legislation to regulate the uses of environmental resources, such as surface waters (e.g., Brasil 1997, 2005, 2011). Consequently, knowing the water quality conditions of the aquatic systems became crucial to meet legal requirements and pushed Brazilian states to develop their own WQMNs. This non-coordinated approach led to the production of water quality data that is difficult to compare on a national scale since the state-level WQMNs have different design and operational strategies (e.g., water quality parameters, sampling frequencies) (ANA 2018). The Brazilian regulation for surface freshwaters (Brasil 2005) established the minimum sampling frequency only for E. coli (or thermotolerant coliforms), while for the other parameters, there is no suggested frequency. Interestingly, a more uniform approach is adopted for monitoring the quality of water bodies specifically intended for domestic use since the Brazilian regulation (Brasil 2021b) establishes minimum requirements for sampling frequency and water quality parameters.

São Paulo State has been a leader in quality monitoring in Brazil, with a WQMN established in 1974 (Midaglia 2011). It currently has more than 500 monitoring sites with a quarterly frequency and measurements of up to 60 parameters in each site (CETESB 2021). The São Paulo WQMN site density is approximately 1.9 sites/1,000 km2 (CETESB 2020), which is higher than both European Environment Agency’s recommendation (1.0 site/1,000 km2) (Nixon et al. 1998) and the national average in Brazil (~ 0.3 site/1,000 km2) (ANA 2019), but lower than the site density from Italy, France, and UK (above 3.0 sites/1,000 km2) (EEA 2022). The network is biased toward highly modified aquatic systems, with the potential of reducing the number of monitoring sites up to 12% in areas with the highest population densities and potential expansions up to 390% in others based on their environmental heterogeneity (de Almeida et al. 2022). Reviewing and potentially reducing the WQMN sampling frequencies, which are currently the same across all sites, could allow spatial network expansions to underrepresented areas. The main objective of this study was to investigate the possibility of sampling frequency reduction in some areas of the São Paulo State WQMN while maintaining a similar degree of information from the original sampling strategy to assess the overall water quality conditions. For this purpose, we used both Kruskal–Wallis and Dunn-Sidak post hoc tests associated with clear objective criteria to assess data redundancies from two-month periods (TMPs) in the current network and potentially reduce the sampling frequencies. These approaches are not specific to São Paulo State WQMN and could be considered for other WQMNs around the world.

Materials and methods

Study area

The São Paulo State, in Southeastern Brazil, covers approximately 248,200 km2 distributed into tropical and subtropical zones (Fig. 1). The state is the most populous (~ 46 million inhabitants, CETESB 2021) and industrialized one in Brazil (IBGE 2020). The hydrography ranges from small catchments to rivers with average discharge greater than 5,000 m3/s and lengths over 1,000 km. The principal cause of surface water pollution is the untreated domestic sewage discharge (ANA 2012; CETESB 2020), but industrial wastewater (Botelho et al. 2013; Alves et al. 2018) and non-point pollution from agricultural areas (Mori et al. 2015; Simedo et al. 2018) are also relevant drivers of water quality degradation. Since 2013, at least 14% of the São Paulo State WQMN sites presented poor or very poor water quality status according to the water quality index (WQI) calculated by CETESB (São Paulo State Environmental Agency, “Companhia Ambiental do Estado de São Paulo”).

Fig. 1
figure 1

Overview of the São Paulo State (gray) location in relation to South America and Brazil (left) and map of the São Paulo State divided into 22 UGRHIs, with the studied UGRHIs colored (right)

For administrative purposes, the São Paulo State regulation (São Paulo 2016) establishes 22 water resources management units (UGRHIs, “Unidades de Gerenciamento de Recursos Hídricos”). These UGRHIs group homogeneous watersheds by environmental features (e.g., geomorphology, geology, hydrology, and hydrogeology). According to the IBGE (2018) classification, agricultural areas are the main land use in the São Paulo State (~ 40% of the total area). Nonetheless, land use is not homogeneous across the state (Table 1), with predominant uses such as forested or urban in some areas. Population densities are also heterogeneous across the UGRHIs, with values from 22 to 3,699 inhabitants/km2 and a strong population concentration in the east portion of the state (CETESB 2021).

Table 1 Land uses and population distribution information from the studied UGRHIs and the São Paulo State in general, with the total area (km2), population (103 inhabitants), population density (inhabitants/km2), and predominant land uses in terms of area

Our study considered four out of the 22 UGRHIs from the São Paulo State. We selected these UGRHIs because they encompass the range of different conditions in the São Paulo State in terms of population density and main anthropogenic impacts, so they are representative of other portions of the state. The selected UGRHIs had contrasting predominant land uses as agriculture (UGRHI 14), artificial areas (UGRHI 06), grassland (UGRHI 01), and forest vegetation (UGRHI 11). Additionally, they represent 19% of the state area and 49% of its population (Table 1).

General workflow and database

De Almeida et al. (2022) developed a spatial update proposal employing cluster analysis associated with monitoring goals definition and stratified sampling strategy to identify redundant monitoring sites and to evaluate the spatial representativeness of the network. De Almeida et al. (2022) clustered sites within each UGRHI to find those that were the most similar to each other with respect to WQI parameters. This statistical procedure resulted in 4, 40, 8, and 4 clusters in UGRHIs 01, 06, 11, and 14, respectively. The WQMN spatial update proposal by the authors indicated redundant monitoring sites that could be removed from the network based on statistics, monitoring goals, and spatial representativeness. However, they did not address the temporal dimension of the studied WQMN, which we consider here for the most representative sites of each cluster. Most representative sites were those maintained in the network after the spatial update proposal that presented the lowest sum of linkage distances (also called clustroid) in each cluster. Such distances were calculated using the function Hierarchical Cluster Analysis of the OriginPro 2016® software.

Here, we used data from 2004 to 2018 for 56 monitoring sites across 40 rivers for the WQMN sampling frequency analysis (see details at Table S1 from Online Resource 1). These sites were monitored with a bimonthly frequency by CETESB as part of the São Paulo State WQMN in UGRHIs 01, 06, 11, and 14 (Infoáguas Online System—https://sistemainfoaguas.cetesb.sp.gov.br/). Data on Escherichia coli (E. coli), pH, biochemical oxygen demand (BOD), total nitrogen, total phosphorus, turbidity, total solids, temperature, and dissolved oxygen (DO) were compiled for these monitoring sites. These parameters make up the WQI calculated by CETESB, an adaption of a previous index for the United States (Ramos et al. 2016). The laboratory analyses were performed by CETESB, with ISO/IEC 17025 (ISO/IEC 2017) sampling and laboratory accreditation by the National Institute of Metrology, Standardization and Industrial Quality, following Standard Methods (APHA et al. 1998, 2005, 2012, 2017). Changes in analytical methods due to the Standard Methods update are not clearly highlighted in the Infoáguas Online System. This is a limitation of the input dataset since water quality parameter concentrations can be influenced by different methods used across the years.

The database used by de Almeida et al. (2022) was the input for developing the WQMN sampling frequency recommendation. The steps followed for data treatment in each UGRHI were (1) identification and evaluation of outliers by interquartile range method (Naghettini and Pinto 2007), (2) exclusion of parameters with more than 10% of missing data as recommended by Olsen et al. (2012) and Calazans et al. (2018b), (3) exclusion of parameters with more than 80% of censored data (below the respective Limits of Quantification), and (4) substitution of censored data by the quantification limit. All WQI parameters were considered suitable for the subsequent analyses and are hereafter referred to as the “approved data points.” Specifically for UGRHI 11, the BOD and TN presented more than 80% of censored data. For this UGRHI, the BOD was not used for the sampling frequency recommendation, and the TN was replaced by the sum of total Kjeldahl nitrogen and nitrate.

Statistical analyses and WQMN sampling frequency recommendation

According to the Köppen-Geiger classification (Kottek et al. 2006), the dominant climate in São Paulo State is tropical wet with dry winters (Aw), with average annual rainfall between 1,250 and 2,250 mm (De Souza Rolim et al. 2007). The dry season encompasses April to September, while October to December is the rainy season (Alves et al. 2005; Barbieri et al. 2004; Luz 2010) when about 80% of annual rainfall occurs (Luz 2010). Dilution capacity of the water bodies increases with high discharge (Quilbé et al. 2006; Zucco et al. 2012; Guo et al. 2019), but rainfall can exacerbate the contribution of nonpoint sources of pollution (Fraser et al. 1999; Manley et al. 2004; Shigaki et al. 2007; Zhang et al. 2012). Aiming at representing such contrasting seasons, we assumed two annual samplings (one in the dry and the other in the wet season) as the minimum sampling frequency for São Paulo State WQMN. Thus, we evaluated the redundancy of TMPs separately for the dry and the wet seasons. Wet season TMPs were October/November, December/January, and February/March, while the pairs April/May, June/July, and August/September made up the dry season.

For each cluster and WQI parameter, we carried out the Kruskal–Wallis hypothesis test (α = 0.05) to evaluate data redundancy among the TMPs. This test indicates the presence or absence of statistically significant differences among more than two samples, but it is insufficient to identify which samples are different (Rafter et al. 2002; Dinno 2015). Thus, we used the Post Hoc test after Kruskal–Wallis to address this gap. We chose the Dunn-Sidak (Sidak 1967) post hoc test (α = 0.05) because it leverages the rank sums from the previous Kruskal–Wallis test (Dunn 1964; Dinno 2015; Lee and Lee 2018) and also provides low rates of type I error (Rafter et al. 2002; Ozkaya and Ercan 2012). The statistical tests were performed in OriginPro 2016® and MATLAB 2015a. Non-parametric tests were chosen because the data series for some water quality parameters did not fit normal distribution according to previous Shapiro–Wilk normality tests (α = 0.05). For each parameter, we ran three comparisons between the TMPs from the dry season and three others between the TMPs from the wet season (Fig. 2). Thus, each season had the maximum of three identified differences when all comparisons among the TMPs returned statistically significant differences. Zero was the minimum number of identified differences when none of the comparisons returned statistically significant differences. Considering the identified differences, we applied objective criteria to indicate the sampling frequency requirements for each parameter and cluster (Table 2). For example, when three statistically significant differences were identified, we suggested a minimum of three samplings in the season.

Fig. 2
figure 2

Comparisons between the two-month periods (TMPs) ran in the Dunn-Sidak post hoc test for the wet and dry seasons in the São Paulo State

Table 2 Criteria for defining the number of recommended samplings for the wet and dry seasons based on the total number of statistically significant differences identified in the Kruskal–Wallis followed by the Dunn-Sidak post hoc

We did not consider the water temperature to define the sampling frequencies because this parameter is especially sensitive to seasonal variations (Dallas 2008; Cruz et al. 2019; Nam et al. 2021) yet such variation is less pronounced in the study area than for studies from temperate areas.

A sampling frequency recommendation for each cluster and parameter would be impracticable from an operational point of view. For this reason, we compiled the results for each parameter to develop a common sampling frequency recommendation for each UGRHI. For each parameter, we considered the number of recommended samplings for each season as the one consistently observed for least 95% of the clusters. Following this conservative criterion, the final sampling frequency suggested for each UGRHI was the highest among all the parameters analyzed in this study.

After the post hoc test, we calculated the ratio (hereafter “participation ratio”) between the number of statistically significant differences identified and the total number of potential statistical differences for each parameter and TMP (see an example in Eq. 1 considering BOD). If two samplings were recommended for the season, we suggested sampling the TMPs with the highest average participation ratios (Eq. 2). The objective was to select the TMPs with the most contrasting water quality conditions in each season. The participation ratios for temperature were used as tiebreakers when two or more TMPs presented the same average participation ratio. In this case, we suggested sampling the TMPs with the highest participation ratio for temperature. Thus, the WQMN sampling frequency recommendation was composed of the number of suggested samplings for each season and the priority TMPs for sampling. Figure 3 summarizes the workflow employed in the present study.

$${\text{R}}_{\text{BOD, TMP}}\text{=}\frac{{\text{n}}_{\text{BOD, TMP}}}{{\text{N}}_{\text{BOD, TMP}}}\times {100}$$
(1)

where:

Fig. 3
figure 3

Workflow summary for the sampling frequency recommendation for each water resources management unit (UGRHI). The maintained monitoring sites, the clusters, and the approved data points came from the spatial update proposal developed by de Almeida et al. (2022)

RBOD = participation ratio of the TMP for BOD.

nBOD,TMP = number of statistically significant differences identified after the post hoc test for the TMP considering BOD.

NBOD, TMP = number of potential statistically significant differences in the post hoc test for the TMP considering BOD. Within a UGRHI with four clusters, for example, the number of potential statistical differences is eight (multiplying the number of clusters by the number of comparisons that the TMP participates).

$${\text{R}}_{\text{average, TMP}}\text{=}\frac{{\text{R}}_{\text{E. coli, TMP}}\text{ } + { \, {\text{R}}}_{\text{BOD, TMP}}\text{ + }{ \, {\text{R}}}_{\text{pH, TMP}}\text{ } + { \, {\text{R}}}_{\text{TN, TMP}} \, \text{+}{ \, {\text{R}}}_{\text{TS, TMP}} \, \text{+}{ \, {\text{R}}}_{\text{TP, TMP}} \, \text{+}{ \, {\text{R}}}_{\text{DO, TMP}} \, }{\text{p}}$$
(2)

where:

Raverage, TMP = average participation ratio of the TMP of interest;

RE. coli,TMP, RBOD,TMP, …, RDO,TMP = participation ratios of the TMP of interest for each parameter in each UGRHI (according to Eq. 1);

p = total number of analyzed parameters.

Finally, we ran the Mann–Whitney non-parametric test with a significance level of 0.05 for each WQI parameter. This statistical test indicates whether two samples belong to the same population or not. We aimed to look for statistical differences or similarities in the data structure before and after the optimization by comparing the original bimonthly data series with the data series generated with the recommended sampling frequencies.

Results

Approved data points

For the WQMN sampling frequency recommendation, more than 23,000 approved data points for the eight water quality parameters were considered, resulting in an average of ~ 1,021 data points per parameter in each UGRHI (Table 3). The approved data points had an average annual sampling frequency of six samplings for all the analyzed parameters in all UGRHIs, preserving the original bimonthly frequency from CETESB’s monitoring scheme (see details at Table S2 from Online Resource 1). The data availability was heterogeneous due to different monitoring site densities and the contrasting starting dates of operation. In addition, the data points were well-distributed across the wet and dry seasons for all UGRHIs and parameters. The medians suggested considerable variation in the water quality across the selected UGRHIs with the worst water quality in UGRHI 06 (Table 3). For this UGRHI, parameters frequently related to domestic sewage discharge (E. coli, BOD, total nitrogen, total phosphorus, and low DO) indicated a worse water quality in the dry season as compared with the wet season, a pattern not observed for the other UGRHIs. The WQI also showed the worst water quality condition in UGRHI 06 (32 ± 26, average ± standard deviation), classified as poor. For the other UGRHIs, the average WQIs were classified as good, with values of 54 ± 8, 63 ± 10, and 57 ± 14 for UGRHIs 01, 11, and 14, respectively.

Table 3 Overview of the approved data points to the WQMN sampling frequency recommendation with the total number of data points, numbers of data from wet and dry seasons, and descriptive statistics for data on Escherichia coli (E. coli), pH, biochemical oxygen demand (BOD), total nitrogen (TN), total phosphorus (TP), turbidity (Turb), total solids (TS), and dissolved oxygen (DO) for all water resources management units (UGRHIs)

Identification of statistically significant differences and sampling frequency requirements

For each water quality parameter, the Kruskal–Wallis (α = 0.05) followed by the Dunn-Sidak post hoc test identified the statistically significant differences among the TMPs from the wet season and among the TMPs from the dry season. To exemplify our workflow, we present here the details for the UGRHI 11 (Table 4). In that UGRHI, we observed the absence of statistical differences among the TMPs for most of the clusters. For the wet season, one statistical difference was identified for E. coli (cluster 4), total phosphorus (cluster 7), and turbidity (clusters 1, 4, and 8), while for the dry season, one difference was observed for total solids (cluster 7) and DO (cluster 4).

Table 4 Statistically significant differences (α = 0,05) between the two-month periods (TMPs) identified after the Dunn-Sidak Post Hoc test for each parameter in UGRHI 11. The numbers of recommended samplings for each parameter in the seasonal and annual scales are also presented, based on the statistically significant differences identified after the Dunn-Sidak Post Hoc test. The results are presented for each cluster considering data on Escherichia coli (E. coli), pH, the sum of total Kjeldahl nitrogen and nitrate (TKN+NO3), total phosphorus (TP), turbidity (Turb), total solids (TS), and dissolved oxygen (DO) The two-month periods are identified as 1 (October/November), 2 (December/January), 3 (February/March), 4 (April/May), 5 (June/July), and 6 (August/September). The symbols “-” and “+” indicate the absence or presence of statistically significant differences, respectively. Total differences equal to or higher than one in the wet and dry seasons are marked in bold. The clusters came from the spatial update proposal developed by de Almeida et al. (2022)

We applied the criteria described in the methodology (Table 2) to indicate the number of recommended samplings for each season and water quality parameters. Again, we only present here the results for the UGRHI 11 to exemplify our workflow (Table 4). Due to the absence of statistical differences in the UGRHI 11, the wet and dry seasons would require only one sampling each for most of the parameters. Thus, two annual samplings would be feasible for most of the parameters in this UGRHI since only seven out of 56 cases analyzed indicated the number of annual samplings would need to be higher than two. The maximum of two samplings was indicated for the wet season in UGRHI 11 for E. coli (cluster 4), total phosphorus (cluster 7), and turbidity (clusters 1, 4, and 8). Two samplings were also the maximum indicated for the dry season for total solids (cluster 7) and DO (cluster 4).

Figure 4 summarizes the sampling requirements for each parameter in the wet and dry seasons, showing the sampling recommendations associated with the respective percentage of clusters in each UGRHI. In general, one sampling per season could capture annual mean values in most clusters, either in the wet or dry seasons. All clusters in the UGRHI 01 presented this possibility. Nonetheless, the other UGRHIs had a wider range of acceptable sampling frequencies, with one sampling as enough in the wet season for 50 to 100% of the clusters. For this season, UGRHI 14 presented the lowest percentage of clusters (50%) where one sample would suffice, namely, for E. coli, turbidity, and DO (Fig. 4 g). Considering the dry season, one sampling would suffice for 0 to 100% of the clusters. Similar to the observed for the wet season, UGRHI 14 presented the lowest percentage of clusters where one sample would be enough, with two samplings recommended for turbidity in all clusters (Fig. 4 h).

Fig. 4
figure 4

Sampling frequency recommendations for each parameter in the wet and dry seasons associated with the percentage of clusters in UGRHIs 01 (a, b), 06 (c, d), 11 (e, f), and 14 (g, h). The results are presented for Escherichia coli (E. coli), pH, biochemical oxygen demand (BOD), total nitrogen (TN), sum of total Kjeldahl nitrogen and nitrate (TKN + nitrate), total phosphorus (TP), turbidity, total solids (TS), and dissolved oxygen (DO). The clusters came from the spatial update proposal developed by de Almeida et al. (2022), grouping similar monitoring sites concerning the WQI parameters

Sampling frequency recommendation

The sampling frequency recommended for each parameter was the one identified in our analyses as necessary in at least 95% of the clusters for the wet and the dry seasons. We found contrasting sampling requirements across the different UGRHIs (Table 5). For example, in UGRHI 01, one sampling would be enough for E. coli and DO in the dry season, while in UGRHI 14, three samplings would be required for these parameters for the same period. Turbidity, DO, E. coli, and BOD were the parameters that required at least two samplings in wet season for the highest number of UGRHIs (two). The DO more frequently required at least two samplings for the dry season (in three UGRHIs), followed by E. coli and total solids (in two UGRHIs).

Table 5 Sampling frequencies for the wet and the dry seasons for each water quality parameter, totals for each season, and annual totals recommended in the studied UGRHIs. The parameters analyzed were Escherichia coli (E. coli), pH, biochemical oxygen demand (BOD), total nitrogen (TN), total phosphorus (TP), turbidity (Turb), total solids (TS), and dissolved oxygen (DO)

For each UGRHI, the final recommended sampling frequency was based on the maximum for all parameters. A minimum of two annual samplings was recommended for UGRHI 01, which presents the second highest population density from the studied UGRHIs and grassland as the predominant land use (Table 1). On the other hand, sampling frequency should not be reduced in UGRHI 14 (remaining six annual samplings), which has the third greatest population density and where land use is predominantly agricultural. Our results also showed the possibility of sampling frequency reduction in the UGRHIs 06 and 11, with a final recommendation of four annual samplings.

The participation ratios for the TMPs remained below 10% for most of the parameters in each UGRHI except for the UGRHI 14 (Table 6). This condition could be partially explained by the reduced number of statistical differences identified after the Dunn-Sidak post hoc test, which indicated the similarity of the data points regardless of the sampling period. The average participation ratios ranged from 0% (i.e., no statistically significant differences identified among the TMPs in UGRHI 01) to 23% (i.e., for the parameters analyzed in UGRHI 14, on average, 23% of the comparisons for August/September in the post hoc test presented statistically significant differences) (Table 6). In addition to the highest participation ratios, the UGRHI 14 also showed the lowest proportions of clusters where only one sampling per season was recommended (Fig. 4 g and h).

Table 6 Participation ratios for the two-month periods (TMPs) for each parameter and UGRHI. The bold-underlined average participation ratios represent the two-month periods that should be prioritized for sampling

Prioritizing TMPs for sampling was unnecessary for UGRHI 14 since the sampling frequency reduction was unfeasible, as well as for UGRHI 01, where statistically significant differences were absent, and the samplings could run in any TMPs. For the UGRHI 06, October/November and December/January should be prioritized for sampling the wet season since they had the highest average participation ratios (5% and 3%, respectively). All dry season TMPs had the same average participation ratios in UGRHI 06 (2%), but we recommended sampling June/July and August/September due to the highest participation ratios for temperature (59% and 35%, respectively). For UGRHI 11, we suggested sampling October/November and February/March in the wet season, while April/May and June/July in the dry season. In this UGRHI, there was a tie between June/July and August/September concerning the average participation ratios (1%), but June/July was suggested for sampling due to the highest participation ratio for temperature (81%) compared to August/September (31%).

The results for the Mann–Whitney non-parametric test (α = 0.05) showed that the recommended sampling frequencies did not significantly change the structure of the data series for any of the studied UGRHIs. All the 31 comparisons ran for the WQI parameters indicated the absence of significant statistical difference considering the data series before and after the sampling frequency optimization. This indicated that the data remained similar even with the potential sampling frequency reduction in UGRHIs 01, 06, and 11.

Discussion

Our results reinforced the possibility of sampling frequency updates in the study area. This opportunity was also observed by CETESB in an internal statistical study ran parallel to this paper (Rugue Junior et al. 2020), reducing the sampling frequencies from bimonthly to quarterly in all UGRHIs after 2020. The optimization method employed by CETESB focused on identifying the water quality seasonality based on data from the monitoring sites operated in UGRHIs 05 and 06. For this, the function Phenophase from software R was run, followed by comparisons between the data series before and after the optimization to evaluate changes in the rivers' water quality status based on WQI and in the patterns of legal limits violation. Our study had a more customized approach to each UGRHI, which could complement the work developed by Rugue Junior et al. (2020), and we suggest the need for more individualized monitoring strategies for each UGRHI. This need is especially crucial for UGRHI 14, which presented the highest proportions of clusters for which more than one sampling were recommended in the wet and dry seasons (each) for E. coli, total phosphorus, turbidity, and DO (Fig. 4 g and h).

Other WQMNs worldwide have undertaken similar analyses of appropriate sampling frequencies (Table 7). Some criteria used to indicate the potential for sampling frequency reduction were consistent data before and after the optimization, with water quality status (e.g., high, good, moderate, poor, and bad) remaining the same for all monitoring sites (Naddeo et al. 2007; 2013; Scannapieco et al. 2012; Liu et al. 2014); statistical similarities among sampling months, considering specific water quality parameters to meet the monitoring goals (Peña-Guzmán et al. 2019); and optimized sampling frequencies that provide accurate estimates (e.g., less than 5% error) of water quality indicators compared with values obtained from reference sampling frequencies (Vilmin et al. 2018). In the present study, the recommended sampling frequency reduction was associated with the absence of statistically significant differences from the TMPs’ data series, suggesting data redundancy.

Table 7 Overview of studies on water quality monitoring network sampling frequency optimization compared to the present study. For each reference, we present the study area (river/country), the study area characteristics, the optimization methods employed, the drainage area, the initial and the final proposed sampling frequencies

UGRHIs 01 and 14 presented the most contrasting sampling frequency recommendations, with two and six suggested annual samplings, respectively. Others have documented correlations between water quality variation and land use (Yu et al. 2016; Shi et al. 2017; Xu et al. 2019). Agricultural areas, such as in UGRHI 14, can increase water quality variation due to pollutants loads from surface and subsurface runoff (Connolly et al. 2015; Yu et al. 2016; Wu et al. 2020; Badrzadeh et al. 2022), while forested and grassland areas (UGRHI 01) contribute to more stable conditions, intercepting and retaining pollutants as well as preventing soil erosion (Sliva and Dudley Williams 2001; Shi et al. 2017; Lei et al. 2021). These factors could partially explain the different recommended sampling frequencies in the study area, especially for UGRHIs 01 and 14, where contrasting land use and population distribution are present (Table 1). Future research on the key drivers affecting water quality temporal variation in the studied UGRHI could benefit the São Paulo State water resources management.

Past studies reinforce our findings, highlighting that different monitoring sites designed to meet the same goals can demand contrasting monitoring strategies depending upon the water quality variability (natural or anthropogenic). The main aspects considered for the flexible sampling frequencies were: higher sampling demand in areas with water quality worsening trends (Naddeo et al. 2007; 2013; Scannapieco et al. 2012); contrasting seasonal conditions in water basins (e.g., climate, streamflow) (Sokolov et al. 2020; da Luz et al. 2022); and anthropogenic pressures in water quality (e.g., point sources of pollution) (Vilmin et al. 2018; Peña-Guzmán et al. 2019). Our more customized approach associated with the evaluation of sampling requirements by clusters of similar monitoring sites concerning the WQI parameters could aid others in incorporating water quality variability into the planning of adaptive WQMNs.

Dissolved oxygen and E. coli were the parameters that we suggest need greater sampling frequency in both the wet and dry seasons in most of the studied UGRHIs. Temporal variability of these parameters was reported by other researchers and could partially explain the need for greater frequency of sampling in the present study. Rainfall and streamflow seasonality, as well as changes in temperature, were some of the main causes for DO (He et al. 2011; Ogwueleka 2015; Post et al. 2018; Vilmin et al. 2018; Ogwueleka and Christopher 2020; Zhi et al. 2021) and E. coli (Schilling et al. 2009; Iqbal et al. 2017; Muirhead and Meenken 2018; Jeon et al. 2020) variations in rivers from contrasting climates (e.g., tropical, temperate, semi-arid) and land uses (e.g., agricultural, pasture, forest; artificial). Rainfall and streamflow seasonality as well as temperature variation can be even more important in tropical and subtropical areas like the São Paulo State, where the temperature and annual rainfall are relatively high but seasonally variable. In general, these previous studies observed the highest E. coli levels in the periods with the highest temperature, river discharge, and rainfall. We also found E. coli median values were greatest in the wet season in three out of four UGRHIs (Table 3), which is characterized by the highest rainfall, river discharge, and temperature. Escherichia coli median value was greater in the dry season in UGRHI 06, where only 48% of the sewage from more than 21 million inhabitants is treated (CBH-AT 2021), and lack of untreated sewage dilution probably overrode surface runoff contribution.

Schilling et al. (2009) indicated that the period with higher rainfall and river discharge in the Raccoon River basin (USA) was associated with higher E. coli concentrations. For example, the E. coli median concentration in the upper 25% percentile of river discharge was nearly eight times higher than the median concentration in the lower 75%. Despite this overall condition, the positive linear correlation from daily data was not particularly strong (r2 = 0.35) between the E. coli concentration and river discharge. According to Schilling et al. (2009), the rain in the dry periods may not increase river discharge much, but runoff can still transport fecal material into the water bodies. They also highlighted that the point sources of pollution (e.g., cattle in streams and discharge of wastewater treatment plants) could generate local peaks of E. coli concentration unrelated to streamflow. Local aspects could be especially relevant for the E. coli variation in rivers in developing countries, where there can be less regulation of point (Doughari et al. 2011; Islam et al. 2018; Hiruy et al. 2022) and nonpoint sources (Xue et al. 2018; Iqbal and Hofstra 2019; Mushi et al. 2021) that contribute to elevated E. coli levels in surface waters. Temperature can influence variation in fecal contaminant indicators (e.g., E. coli) by interacting with the different land sources and altering microbial growth rates and other aspects of metabolism (Jeon et al. 2020). This aspect could also be relevant for the study area, where air temperatures ranged from 9 to 38 °C (data not shown).

Vilmin et al. (2018) analyzed the DO variation on the annual scale and observed that 70% of the DO total variability was associated with seasonal variability. Similar results were obtained by He et al. (2011), who observed seasonal effects related to climate and streamflow conditions in the Bow River (Canada). Previous studies agreed on the temperature effects on the DO concentrations, with the lowest temperatures contributing to the increase of DO concentrations as would be expected with greater solubility in colder water (He et al. 2011; Vilmin et al. 2018; Zhi et al. 2021). Nonetheless, there is no consensus concerning seasonal discharge patterns influencing DO concentrations. Some studies reported that the periods with the highest river discharge and rainfall presented higher DO concentrations (He et al. 2011; Ogwueleka 2015; Vilmin et al. 2018), but others reported lower DO concentrations in these periods (Post et al. 2018; Ogwueleka and Christopher 2020). The complexity of temporal variation of DO concentrations can be attributed to the abiotic and biotic aspects that simultaneously influence the gains and losses of DO in surface waters (He et al. 2011; Zhi et al. 2021). Moreover, DO concentrations can also vary depending upon time of day. Cox (2003), in a review about DO modeling in lowland rivers, indicated that the main abiotic aspects of this balance are equilibration with the atmosphere from super and sub-saturated conditions and enhanced aeration from turbulent flows (e.g., weirs, rapids). In general, these abiotic aspects are influenced by DO solubility, which decreases with the increase in temperature, altitude, and salinity (Henry’s law). Input of hypoxic or anoxic groundwater sources can also decrease DO (Hall and Tank 2005). Aerobic respiration consumes DO and primary production produces it. Dodds et al. (2013) found that even the largest rivers (e.g., Mississippi — drainage area of 2,900,000 km2) can have oxygen variability driven by photosynthesis. Pollution that alters biotic aspects (e.g., organic carbon inputs, interference with light input) can also be important (Cox 2003; He et al. 2011; Zhi et al. 2021).

In the present study, most of the analyzed UGRHIs presented the highest DO median concentrations in the wet season (Table 3), suggesting that processes increasing DO were favored in the period with the highest rainfall, river discharge, and water temperature. These conditions could also stimulate photosynthesis or allow reaeration to offset DO consumption by respiration. The UGRHI 06 presented a contrasting pattern, with the lowest DO median concentration in the wet season. As hypothesized for E. coli, this deviation could be partially attributed to the point and nonpoint sources of pollution on the water quality, especially the untreated sewage in this UGRHI, which increases the biochemical oxygen demand in surface waters due to the high load of organic matter (Hvitved-Jacobsen 1982; Worrall et al. 2019).

Our data series before and after the sampling frequency optimization were consistent for all UGRHIs, with no statistically significant differences for the WQI parameters. This result indicated our sampling frequency recommendation preserved the data structure from the original data series (bimonthly), suggesting that the optimized network could provide a similar degree of information even with reductions of sampling frequencies in three out of four UGRHIs. Consequently, we confirmed the possibility of sampling frequency reduction in some areas of the São Paulo State WQMN. Our optimization strategy was focused on identifying the sufficient sampling frequency to provide a similar degree of information from the original bimonthly sampling to the assessment of overall water quality conditions. We recognize that defining sampling frequencies for monitoring sites intended to regulate or control specific anthropogenic activities (e.g., contamination events such as toxin releases or sporadic episodes related to watershed disturbances) would require a different strategy. These sites demand a customized approach concerning the type of regulated activity, the characteristics of the process to be inspected, and the legal requirements for the monitoring. Additionally, recommendations to decrease sampling frequency are not permanent, and should be revised if substantial changes occur in a watershed (e.g., rapid expansion of population or change in land use, reservoir construction).

Our optimization method considered the existence of two well-defined annual hydrological seasons. This assumption was based on previous meteorological studies, which aimed at determining the rainy period in Southeastern Brazil (Barbieri et al. 2004; Alves et al. 2005). Barbieri et al. (2004) adopted an approach based on the exceedance of precipitation limits, while Alves et al. (2005) considered both the reduction of outgoing long-wave radiation and the exceedance of precipitation limits. Despite the contrasting methods, these previous studies agreed the rainy period begins in October and ends in March. Research initiatives that analyze the precipitation patterns (with data from rain gauges and/or satellites) and the potential effects of climate changes could complement the present study, confirming the presence of two well-defined annual hydrological seasons in the studied UGRHIs. Future research initiatives should also pay more attention to the seasons that the recommendation was maintaining the original sampling frequency since more samplings could be necessary for a better representation of temporal variation in water quality.

We argue that reproducing our methodology in the same area, but considering datasets with different monitoring strategies (e.g., monthly sampling frequency, flow-proportional sampling, automated monitoring, adding or excluding WQMN monitoring sites), could lead to contrasting results in the statistical tests. This is possible because different data series can represent other sources of water quality variation not captured in the bimonthly monitoring (e.g., seasonal effects, biogeochemical interactions, discharge fluctuations, anthropogenic activities). Additionally, we emphasize the statistical analyses employed in the present study (Kruskal–Wallis followed by Post Hoc test) requires at least three groups for comparisons. Thus, our methodology is suitable for optimizing existing WQMNs with sampling frequencies equal to or higher than three per period of interest. Alternative approaches should be used for lower sampling frequencies, (e.g., Mann–Whitney hypothesis testing).

With the relatively recent advances in the technologies for in situ water quality measurements, the acquisition and transmission of data below 15-min intervals are possible for some of the parameters we studied. However, few studies were developed for planning high sampling frequency (sub-daily) WQMNs (Jiang et al. 2020). This condition could be partially attributed to the greater interest in establishing WQMNs to capture long-term trends that do not demand high-frequency sampling. Additionally, the high purchase and maintenance costs can limit the expansion of automated systems (e.g., sensors, multiparameter probes, data loggers) (Pellerin et al. 2016). Relatively frequent visits to the monitoring sites, as well as collection and analysis of samples using traditional methods are still needed with automated systems to ensure the probes are producing reliable data and not drifting. Sensors for DO, turbidity, pH, solids and ammonia are now available for several in situ water quality probes. Thus, the high monitoring sampling frequency for such parameters and others may aid in understanding daily water quality variations, the biogeochemical interactions affecting the water quality, and the effects of rainfall events on the water quality (Vilmin et al. 2018; Nguyen et al. 2019; Jiang et al. 2020). This is particularly important for documenting short-term low DO excursions that could harm aquatic life (Vilmin et al. 2018). Daytime sampling may not capture these critical episodes. In this direction, CETESB established an automated WQMN in 1998, aiming to regulate industrial and domestic sources of pollution and to monitor the river water quality for public supply (CETESB 2021). In 2020, 17 monitoring sites integrated this network with five-minute measurement frequency for DO, turbidity, pH, specific conductance, and water temperature (CETESB 2021). Thus, future initiatives that address the planning and the review of high sampling frequency WQMNs and the interpretation of the available data from these networks could complement the approach of the present study and improve the water resources management in Brazil and other countries in general.

Conclusions

Our study indicated the possibility of updating sampling frequency in the São Paulo State WQMN due to temporal data redundancy of some WQI parameters. In the UGRHIs 01, 06, and 11, the number of annual samplings could be reduced from six to two, four, and four, respectively. The sampling frequency in UGRHI 14 should remain bimonthly (i.e., six annual samplings). The contrasting patterns in minimal sampling frequencies from the studied UGRHIs reinforced the importance of more customized approaches for planning and reviewing monitoring sampling frequencies. We emphasize that adaptive management requires assessment of drivers and responses and that a suggested decrease in sampling frequency should be re-visited every few years, particularly when there are strong changes in watershed characteristics.

Future research initiatives based on our results could contribute to understanding the drivers of water quality variability in the studied UGRHIs and tropical and subtropical areas in general. The UGRHI 14 could be a starting point since it presented the highest proportions of clusters where the analyses showed more than one sampling in the wet and dry seasons were recommended for E. coli, total phosphorus, turbidity, and DO. The year-round variability of DO and E. coli is particularly relevant for the whole study area since they presented the highest number of UGRHIs demanding more than one sampling either for the wet or for the dry season. Incorporating data from hydrometeorological networks (e.g., precipitation, streamflow, temperature) and high frequency water quality data (sub-daily) would benefit future studies since climate seasonality, discharge fluctuations, and temperature variations could be significant drivers of water quality variability. Such integration is a challenge in São Paulo State and Brazil in general because the hydrometeorological and the water quality networks are traditionally run separately and not integrated, leading to unpaired data. Additionally, high-frequency automated water quality sampling is just beginning to be implemented and restricted to few monitoring sites that aim at regulating anthropogenic activities.

The main advantage of our approach is the simplicity of implementation since it depends on well-established statistical analyses, which were widely disseminated and available in several open-access programs. We expect our methodology could serve as a basis for both the WQMN optimization and adaptive water management in other developing countries. This approach is particularly relevant for these countries where the optimization guidelines are still limited, and the financial resources are usually scarce for water quality monitoring. Additionally, representative data are essential in developing countries due to poor water quality status in several watersheds associated with high water demand for contrasting uses. When feasible, the savings from sampling frequency reduction could provide resources to expand the network to areas with a lack of data and to create more automated water quality networks. The financial impacts of this reduction could be even more important in Brazil, where the field campaigns exert high pressure on the monitoring costs due to logistical challenges like (1) distances between offices and monitoring sites and (2) distances between monitoring sites and analytical labs. Thus, the WQMN optimization is crucial to increase the representativeness of water quality data in the spatial and temporal dimensions. Future research initiatives are still needed to compare our approach with different methodologies to determine their agreement and efficiency in optimizing WMQNs worldwide.