1 Introduction

Only the continuous operation of water quality monitoring systems can provide (i) sufficient information regarding the current state of the water bodies, (ii) a basis for prompt intervention in the case of environmental emergencies, and (iii) sufficient data for forecasting in the interest of sustainable development (Burton 1987). Still, when there is a need for budget restraints (Chilundo et al. 2008), decision makers tend to target environmental protection, as decisions in this area (e.g. decreasing the spatiotemporal sampling frequency of monitoring systems) mostly affect future generations rather than having a direct and immediate impact. Science, however, has a right and a duty to analyze the consequences of such decisions.

In particular, it is the responsibility of the scientific community to ensure that changes in sampling activities happen with as little information-loss as possible (Barca et al. 2013; Kovács et al. 2012a, b; Tanos et al. 2015). In recent years, there have been many studies published which deal with the spatial optimization/recalibration of monitoring systems or stand in close relation to this, even if not mentioning this aim explicitly.

Wu et al. (2010) and Chen et al. (2012) use matter element analysis to find “homogeneous” sections of rivers and this way optimize monitoring networks. In the case of the latter, working on the 1890 km long upper and middle reaches of the Heilongjiang River in Northeast China, the input data were “compressed”, just as in the case of the work of Kovács et al. (2012b), who used a special coded clustering technique to determine “homogeneous” regions in Lake Balaton, one of the water bodies examined later in this paper. Astel et al. (2006) used cluster analysis (CA) to separate urban drinking water quality zones in Poland, while Gomes et al. (2014) combined cluster— and principal component analyses to group similar sampling locations of the Leça River (Portugal).

There are also other studies which combine statistics and remote sensing to calibrate the monitoring network of a water system, such as the work by Zhouhu et al. (2011), who managed to optimize the sampling grid of Lake Nansi in China using a grid estimation method.

Another possible way of recalibrating the spatial sampling density is to use the basic function of geostatistics, the semivariogram. In the study of Casper et al. (2012) geospatial processing and geostatistics were combined to determine different water quality zones on the Hillsborough River. Ferreyra et al. (2002) also employed the semivariogram on soil water data, while the same approach has been used on shallow groundwater data by Hatvani et al. (2014a), and on datasets representing groundwater-surface water interaction by Kennedy et al. (2008). The spatio-temporal revision of the shallow groundwater monitoring network of an agricultural area in Norhtern Italy was also prepared using a combination of geostatistics and evolutionary polynomial regression (Barca et al. 2013). The main drawback of semivariograms is that these can only be applied to every parameter separately. From these publications it becomes clear that there are many methods available when it comes to recalibrating a monitoring system.

The main aims of this study were (i) to show how to determine whether there is redundancy between the already functioning sampling sites, and (ii) to make proposals for the location of new sampling sites for various surface and sub-surface water systems using a novel method, Combined cluster and discriminant analysis (CCDA, first introduced in Kovács et al. 2014). With the aid of CCDA, the optimization of any monitoring system, regardless of its surface or sub-surface characteristics, could be achieved objectively, so that a decrease in sampling sites would occasion only minimal information loss.

2 Materials and Methods

2.1 Description of the Study Areas

This study deals with datasets from four different areas (Fig. 1), which will be presented in the following subsections.

Fig. 1
figure 1

Location of the four study areas: River Danube (a), Kis-Balaton Water Protection System (KBWPS; (b)), Lake Balaton (c) and the watershed of Lake Neusiedl/Fertő (WLN; (d))

2.1.1 River Danube

The River Danube is the second longest river in Europe with a watershed area of ~817,000 km2, stretching 2,872 km between the Black Forest (Germany) and the Black Sea (Romania). The Hungarian section (Fig. 1a), with a mean runoff of 2,000 m3s−1, has a length of 417 km, with numerous large islands. The two largest are Szentendre Island (31 km long) and Csepel Island (48 km long). Besides naturally occurring islands, an artificial installation, a hydroelectric power plant (Gabcikovo-Nagymaros Water Barrage System at the Hungarian-Slovakian border) also separates the main branch, here into side-branches. This system was put into operation in 1992 and greatly modified this river section. As a result, 80 % of the runoff coming from the main branch was diverted to Slovakia, to the headwater section of the river, leaving 400 m3s−1 in Hungary, in the main branch. The tail water section, downstream of the power plant, reaches the main branch at river kilometer (rkm) 1806 (Kovács et al. 2015).

Regarding the Danube’s tributaries in the Hungarian section, four natural ones are worthy of mentioning: (heading downstream) the Rába/Raab with 27m3s−1, the Vág/Váh with 196 m3s−1, the Garam/Hron with 55 m3s−1, and the Ipoly/Ipel with 21 m3s−1; the Sió Canal is also significant. Being the periodically-functioning outflow of Lake Balaton, the runoff of the Sió mainly depends on the amount of water discharged. For example, between 2001 and 2005 its average runoff was only 20 m3s−1, because only smaller streams contributed to its discharge as there was no water arriving from Lake Balaton.

It should be mentioned that the Danube is a major economic factor and riverine transportation route. Furthermore, its delta is under UNESCO (United Nations Education, Science and Culture Organization) protection (Popescu et al. 2015). Thus, the preservation of its water quality is of key importance.

2.1.2 Lake Balaton and the Kis-Balaton Water Protection System

The second area studied is Lake Balaton (Fig. 1c), the largest shallow freshwater lake in Central Europe (Padisák and Reynolds 2003). It is a nationally-important tourist attraction and recreation area. The lake’s surface area is 596 km2, its average water depth is 3.2 m, and its morphologically diverse watershed covers approximately 5181 km2, in which 51 inflows are located. The mean depth and surface area of the lake’s four geographical basins increases from west to east, while the area of the corresponding sub-watersheds decreases. The largest tributary, the River Zala, supplies almost 50 % of the lake’s total water input; the remaining 50 % comes from other inputs, such as streams, canals, surface runoff and precipitation. The River Zala used to enter the lake through the Kis-Balaton Wetland (KBW; Fig. 1b) (Korponai et al. 2010) at its westernmost and smallest basin (Keszthely Basin). The only outflow is the Sió Canal at the easternmost end of the lake (Hatvani et al. 2015). A weir was installed in the late 19th century to maintain a steady water level in Lake Balaton by means of the Sió, so that the Budapest-Fiume (today Rijeka, Croatia) railway would be protected from floods. The lake’s water level was fixed at 2–3 m below the mean level of the original one. Because of numerous measures taken (Hatvani et al. 2015), including the one discussed, the KBW partially dried up and decreased in effectiveness as a filtering area for the River Zala’s waters (Lotz 1988). However, other ideas on the separation of the KBW from Lake Balaton also exist (Zlinszky and Timár 2013).

The major goal in establishing the Kis-Balaton Water Protection System (KBWPS — Fig. 1b) was to replace the nutrient-retaining capacity of the once-functioning KBW. The construction of the KBWPS was planned in two phases. Phase I (the 18 km2 Upper Reservoir, inundated in 1985) is an algae-dominated open pond. In Phase II, after its partial inundation in 1992, only a 16 km2 area was functioning. Its water space is covered by reed-dominated macrophytes, and the processes in this area are dominated by decomposition. The remaining areas (the Lower Reservoir, total area 51 km2) were put into operation in 2014.

The direction of water flow in the system leads from Kb4 to 203, with only minor inputs along the way. However, after sampling site 203 it becomes “hectic”. From there it becomes difficult to determine a consecutive order between the sampling sites, as it is hard to determine which sampling site is reached by the water flow first. Moreover, the inputs to Phase II complicate the situation even further (Fig. 1b).

2.1.3 Shallow Groundwater System of the Watershed of Lake Neusiedl (SGS—WLN)

The last area studied, the SGS—WLN (approximately 3000 km2; Fig. 1d) is located in Burgenland, Austria, bordering Hungary next to Lake Neusiedl. The lake itself is the largest steppe lake in Europe (Magyar et al. 2013a). It is under the protection of the Ramsar Convention, an International Union for Conservation of Nature protection zones; it is also a UNESCO biosphere reserve, and has even been designated as a World Heritage site (Magyar et al. 2013b). Naturally, neither the watershed of Lake Neusiedl nor the lake itself ends at the national border. Unfortunately, however, the data gathered from Hungary was not dense enough to be included in the research along with the Austrian data.

Most of the shallow groundwater system is located in an unconfined (~95 %) mixed gravel and sand aquifer complex, generally 5 to 25 m thick (Blaschke and Gschöpf 2011). Only a small part of it can be considered as confined or semi-confined in the upper northern part. The conductivity is generally low and varies widely in the aquifer (Kroiss et al. 2002, 2004). The relief of the aquitard is undulating and sometimes interaction can be witnessed between the groundwater and the salt lakes in the area (Hatvani et al. 2014a). The amount of water taken for irrigation purposes varies between the seasons. Nearly nothing is pumped during winter, while during summer pumping depends mainly on the climate situation and varies from only 1–2 l s−1 up to 80–100 l s−1 for a sole irrigation well. Note that the pumping is constrained by regulations when the groundwater goes below a certain level. The wells generally used for irrigation in the area tap the same aquifer. Only one drinking water well is located in the region and this is taking water from the second aquifer. The average distance between the shallow groundwater monitoring wells is 4 km, with a mean screening depth around 11.25 m (min: 5 m; max: 30 m). In an undisturbed situation the flow velocity varies from 0.06 to 1 m d−1.

2.1.4 Dataset Used

Besides their geographical characteristics, the study areas also varied in (i) the number of parameters, (ii) the length of the time intervals examined, and (iii) the authorities taking the samples, as summarized in Table 1. In the cases of Lake Balaton and the KBWPS all the samples were taken by the same authorities on the same day at all the sampling sites and measured in the same laboratory for the whole time interval, in contrast to the SGS—WLN and the River Danube. In the latter two cases the samples were not taken on the same day; in addition, in the case of the Danube, they were measured by different authorities. It should be noted, however, that the laboratories and the measurement methods were inter-calibrated to ensure comparability of the data.

Table 1 Used parameters for the multivariate analysis with CCDA. The parameters listed were used together in the analysis for each study area

In the study only non-derived, measured variables were used. Missing values occurred in <0.5 % of the cases and were handled. As a result no missing or outlying values were present.

2.2 Methodology

2.2.1 Combined Cluster and Discriminant Analysis

Combined cluster and discriminant analysis (CCDA) is a multivariate data analysis method introduced by Kovács et al. (2014) with the goal of finding not only similar, but even homogeneous groups in measurement data of known origin (in this study, water quality monitoring sampling sites). CCDA consists of three main steps: (I) a basic grouping procedure, e.g. using hierarchical cluster analysis (HCA), to determine possible groupings; (II) a core cycle where the goodness of the groupings from Step I and the goodness of random classifications are determined using linear discriminant analysis (LDA); and a final evaluation step (III) in which a decision concerning the further iterative investigation of sub-groups is taken (Fig. 5 in the Appendix). In this paper, the investigation of sub-groups obtained from the Initial Round (i.e. the one running for the basic grouping containing all sampling sites) of CCDA is called Round 2 (including the execution of the core-cycle for the sub-groups), while the investigation of the sub-groups obtained as a result of Round 2 is called Round 3.

The main idea of CCDA is that once the ratio of correctly classified cases for a grouping (“ratio”) is higher than at least 95 % of the ratios for the random classifications (“q95”), i.e. the difference d = ratio–q95 is positive, then at the level of α = 0.05 the given classification is not homogeneous. In this case, the division into sub-groups (Step III) and the iterative investigation of these sub-groups for homogeneity is required. Further details of the method, and the ccda R package used in this study can be found in Kovács et al. (2014).

2.2.2 Visualization of Differences and Similarities of Sampling Locations

A new method for visualizing differences in CCDA values between pairs of sampling sites is introduced in this paper. In the easier case of a “linear feature” study area, e.g. a river with clear direction of waterflow, consecutive pairs of sampling locations can be compared with CCDA in a meaningful way, as the biggest pairwise difference values will ultimately show in which regions of the study area the greatest changes in water quality can be found. It should be noted that in order to have comparable pair-wise difference values, the number of measurements has to be equal. This can be achieved by resampling the dataset of the site with the larger number of measurements, in this way decreasing the sample size so that it is of equal size to the one containing the fewer data. The resulting plot with the consecutive pair-wise difference values is called 1DdP (1 Dimensional difference Plot).

The extension of this approach to a non-linear space is the so-called 2DdM (2 Dimensional difference Map). In order to create such a map, the difference values for the paired neighboring sites should be calculated and assigned to the midpoint between the two sites. Based on these values an isoline map can be created. This procedure is much more complicated than in the case of a linear feature; it nevertheless provides an informative visualization of the difference values. The 2DdM can give meaningful results especially when the scale of the difference values is large (i.e. some homogeneous and some heterogeneous groups are to be found). However, if most of the groups are not only non-homogeneous, but their observations in the parameter space even form disjoint sets (indicated by a ratio of correctly classified cases of 100 % for that grouping), their comparison based on the difference values is no longer possible in any meaningful way. Here the reason is that beyond the fact that they can be perfectly separated by the (linear) plane (i.e. ratio = 100 %), LDA cannot tell how far apart the disjoint sets lie.

3 Results

First the results from the Danube will be presented in detail, since it is the largest among the study areas, with the highest number of inflows and other factors affecting its water quality. Moreover, this is the most challenging system from a data analysis perspective (for details see Section 2.1.4). The results from the three other areas will be discussed only briefly, with more emphasis on visualization: scatter plots, spline interpolation, difference mapping.

After data preparation, the first step was to group the twelve sampling sites of the Danube using HCA to obtain a basic grouping of the sampling locations (Fig. 2a). After executing the core cycle of CCDA for this basic grouping (Fig. 2b), the highest difference value (31.3 %) was obtained for the grouping into four groups. This resulted in one sub-group containing one sampling site, two sub-groups with three sampling sites, and one with five sampling sites in it (Fig. 2c/Initial Round).

Fig. 2
figure 2

Basic grouping in the Initial Round of CCDA (a) along with the corresponding difference values (b) and the chart representing the summarized results of each Round of CCDA (c). Homogeneous groups found on the River Danube using CCDA (d), where colored squares (red and orange) mark the sampling sites forming one homogeneous group together and the 1 Dimensional difference Plot for consecutive sampling sites on the River Danube e)

In Round 2 the question of whether the four groups obtained are homogeneous or not was explored. The North-West group, containing three sampling sites (D1–D3) fell into three sub-groups (d{{D1},{D2},{D3}} = 27.9 %); D4 had already separated in the Initial Round. In the next group (D5-D7), D5 separated from D6 & D7 (d{{D5},{D6,D7}} = 4.8 %). In Round 3, D6 & D7 did not separate, because of their negative difference value (d{{D6},{D7}} = −4.9 %). In the southernmost group, containing D8–D12, D8 and D9 separated from sites D10–D12 in Round 2. In the last Round (3) D8 and D9 separated from each other, while sites D10–D12 belonged to one homogeneous group (d{{D10},{D11},{D12}} = −3 %; Fig. 2c).

As a final result, the 12 sampling sites of the Danube fell into 9 homogeneous groups, 7 of which contain only one sampling site (D1,D2,D3,D4,D5,D8,D9), one group contains two sampling sites (D6 & D7), and one other contains three sites (D10-D12; Fig. 2c and d).

The Danube can be considered as a linearly ordered system. This allows meaningful pairwise comparison of consecutive sampling sites using the 1DdP to see in which sections of the river the greatest changes in water quality can be found. Comparing the consecutive pairs of sampling sites, it became clear that the largest difference could be found between D1 and D2, the second largest between D4 and D5, and small but still significant differences could be observed between D2 and D3, for example (Fig. 2e). The previously found homogeneous groups of D6 & D7 and D10 & D11 and D11 & D12 can also be seen. It should be noted that the equal number of samples for each sampling site –a necessity — was achieved by resampling, as described in Section 2.2.2.

In the case of the KBWPS, in the Initial Round of CCDA the sampling sites fell into three groups. Sampling site 205 formed a group alone, while the two remaining groups approximately covered the two constructional phases. These separated (Round 2) into further sub-groups with difference values of 15 and 21 %. As a final result the twelve sampling sites of the system separated into eleven homogeneous groups with only Kb10 & Z11 forming one group together (Fig. 6a in the Appendix).

Since the sampling sites between Kb4 and 203 (excluding Kb9, which is located in a separate experimental area (Hatvani 2014)) are located in the direction of water flow –as in the case of the Danube— it was possible to compare their pairwise difference values using a 1DdP. This shows that all consecutive pairs differ significantly except Kb10&Z11, which contain redundant information. Particularly high differences were observed between certain pairs, e.g. 202i & 203 (Fig. 6b in the Appendix).

The next study area, Lake Balaton is in direct connection with the KBWPS. Its 10 sampling sites fell into four groups in the Initial Round of CCDA (d{{B1,B2,B3},{B4,B5},{B6,B7,B8,B9},{B10}} = 26.3 %; Fig. 7a in the Appendix). In Round 2, of the previously obtained four groups three remained unchanged (d{{B1},{B2},{B3}} = −5.5 %, d{{B4},{B5}} = −6.2 %). In the meanwhile, the group containing sampling sites B6-B9 separated into two sub-groups (d{{B6,B7},{B8,B9}} = 4.7 %); Fig. 7b in the Appendix). In the case of the obtained sub-groups (B6 & B7 and B8 & B9), there was no need for further division (d{{B6},{B7}} = −6,3 %, d{{B8},{B9}} = −7.5 %). In summary, the sampling sites of Lake Balaton were grouped into five homogeneous groups by CCDA (Fig. 7c in the Appendix). Unlike the Danube and a part of the KBWPS, the lake is clearly not linear with respect to water flow. Hence, it would only make sense to extend a 1DdP into a 2DdM to visualize the spatial change in the difference values. This, however, was not carried out in the case of Lake Balaton because:

  • the number of the sampling sites in general is too low to create a meaningful map;

  • the sampling sites are not even close to being equally distributed in space;

  • the surface of the lake is intersected by the Tihany Peninsula, which separates it into two parts.

The last studied area (SGS—WLN) was chosen to serve as an example for the applicability of CCDA on a sub-surface water system with a higher number of sites. The 50 studied sampling sites were separated into 49 groups (d > 85 %; Fig. 8 in the Appendix) in the Initial— and 50 groups in the Second (final) Round.

The pair-wise comparison of the sites can again be performed, but it should be extended from one to two dimensions due to the lack of linear ordering of sampling locations. As a pilot area for the 2DdM, the Seewinkel, a sub-region of the WLN was chosen, due to the densely populated monitoring network.

The pair-wise difference values arranged on the midpoints between the sampling sites are very large (around 33 %), concurring with the findings on all the sampling sites at once, where all the sites belonged to different homogeneous groups. However, the spatial variability of the pair-wise difference values was low (Fig. 3). It turned out that the reason behind this was that measurements belonging to different groups formed disjoint sets in the parameter space.

Fig. 3
figure 3

2DdM of the Seewinkel, a sub-region of the WLN. Black dots mark the sampling sites and the crosses the midpoints between two neighboring sites to which the difference values were assigned

4 Discussion

4.1 River Danube

In the case of the Danube, heading downstream, sampling sites D1 and D2 separated because of the Gabčikovo (Bős) hydroelectric power plant. The Danube was diverted upstream of D1, resulting in a decreased runoff (from 2,000 to 400 m3 s−1; Kovács et al. 2015). Being located after the tailwater section reaches the main branch, sampling location D2 represents the Danube with its original runoff (2,000 m3s−1). This results in significantly different water quality to that at D1. The tributaries can be held responsible for the separation of the further sampling sites, concurring with the findings of Sharp (1971). The waters of the Rába/Raab reach the Danube between sites D2 and D3, causing their separation, while the separation of D3 and D4 is the result of the confluence with the Vág/Váh. D4 and D5 were separated by the Ipoly/Ipel and the Garam/Hron Rivers.

In the 200 rkm long river section between D5 and D9 no tributaries are located. Nevertheless, numerous homogeneous groups were formed consisting of one or multiple sites. The reason for the separation of sites D5 and D6, for example, is Szentendre Island, which splits the Danube into two branches. An interesting situation occurs e.g. at sites D6 & D7: they form one homogeneous group despite the fact that Budapest, the capital of Hungary (1.7 M inhabitants) is located between them. This could be because during the period studied 51 % of the sewage waters of the capital, (originating from bank-filtered Danube water) were treated at two sewage treatment plants. Therefore, the chemical composition of the effluent (8 m3s−1) was similar –in the case of the studied parameters— to that of the river, and runoff was marginal in comparison to it as well. This may be the reason for sites D6 & D7 not separating. Note that since 2009 the proportion of treated sewage water in the capital has reached 95 % with the installation of an additional treatment plant.

Moving on, similar to the case of the separation of D5 and D6 caused by Szentendre Island, sampling sites D7 and D8 separated because of the 48 rkm long Csepel Island. The island splits the river into a main— and a side-branch. The composition of the water quality in the side-branch can change significantly, despite its low runoff (25 m3s−1) compared to that of the main branch. Sampling site pair D8 and D9, also separated, this was, however, only to a small degree. The reason behind this is still an open question, since there are neither tributaries nor artificial obstacles in the area.

D9 and D10 separated because of the Sió Canal, while D10–D12 formed one homogeneous group, since there were no such influences in this case.

Based on the pairwise comparison of consecutive sampling locations, the 1DdP showed that the highest differences were caused by the water barrage system, the tributaries and Szentendre and Csepel Islands. If the possibility of settling new sites arises, these should be the locations which receive primary consideration, starting with the location where the highest difference was observed. For further exploring the reasons behind the final grouping practical examples are given: Fig. 9 in the Appendix.

4.2 Kis Balaton Water Protection System

In the KBWPS CCDA pointed out that with the exception of one pair, all of its sampling sites hold exclusive information; there is almost no spatial redundancy in the system. This fact reflects the KBWPS’s character as an artificially engineered wetland (Hatvani et al. 2014b). The only two sampling sites belonging to one homogeneous group are those closest to each other in space (~500 m; Kb10&Z11). The managing personnel have suspected since the system began operating that Kb10& Z11 are redundant, and since Z11 is the outlet of Phase I, Kb10 should be the one left out if need be.

In addition, the biggest pairwise difference was observed between sampling sites 202i and 203 (d{{202i}, {203}} = 32.5 %), where the ecological border between the two Phases stabilized in 1998 (Hatvani et al. 2011). The differences within the Phases were significantly smaller than those between the two. Worthy of mention are Kb6 and Kb7 (d{{Kb6}, {Kb7}} = 21.6 %), located on the two sides of the main dam in Phase I. Kb6 is still in shallow water densely covered by macrophytes at the entrance of the Small Komárom Channel, unlike Kb7. Again, if the opportunity of settling a new site becomes a reality, the areas with the highest differences should be first considered, along with the system’s local characteristics and the authorities’ knowledge of the area.

Because the main water source of Lake Balaton reaches it through the KBWPS, it is highly important for the monitoring network to be optimized in space, so the water quality of the lake’s primary input can be controlled more effectively, and prompt interventions made if necessary.

4.3 Lake Balaton

The situation at Lake Balaton was slightly different from the other studied areas. The water of the lake is still, and the water quality samples are taken on the same day.

Here, based on the CCDA results, only five out of the 10 sampling sites were proven to be necessary in the period 1998–2004. After the publication of the Water Framework Directive (European Council 2000), its goals had to implemented at the national level, thus the spatial monitoring of the surface— and sub-surface water bodies had to be revised (Barca et al. 2015; Højberg et al. 2009). Research was conducted on Lake Balaton using a special coded hierarchical cluster analysis (Kovács et al. 2012b) to achieve this aim. In the case of the present study, CCDA gave a similar result to the original research of Kovács et al. (2012b). However, CCDA is capable of handling scenarios when the sampling falls on different days, while the coded cluster approach used in Kovács et al. (2012b) is not. In this respect it was fortunate that the dataset of Lake Balaton consisted of measurements taken on the same day. Moreover, CCDA provides an objective index-number to determine the homogeneity of the obtained groups.

4.4 Shallow Groundwater System of the Watershed of Lake Neusiedl/Fertő

During the revision of the shallow groundwater monitoring wells, the higher number of sampling locations did not cause any difficulty for CCDA, besides the fact that a somewhat longer computational time was required. The results indicated that all 50 sampling sites of the SGS—WLN provide exclusively different information. As there is no redundancy in the monitoring network, each and every sampling site is necessary. For the large differences between the sites e.g. the nature of the reservoir rock, the composition of the topsoil, vegetation, changes in topography, water flow direction, surface waters, the superposition of the different types of groundwater flow systems and anthropogenic activity as well could be held responsible. All of these could cause local anomalies (see Hatvani et al. 2014a) resulting in the differing behavior of the sites.

Due to the great differences, it is suggested that new monitoring sites be set up in order to better understand the spatial behavior of the shallow groundwater on a much more localized scale. The 2DdM could help in this process by visualizing the magnitude of differences between water quality areas.

However, for the Seewinkel the pairwise difference values are of approximately the same magnitude (around 33 %; Fig. 3), because the measurements belonging to different sampling sites form disjoint sets in the parameter space, and hence can be perfectly separated by a plane in LDA. However, beyond this, LDA cannot tell how far apart these disjoint sets are. This is also an obvious limitation for the meaningful comparison of pairwise difference values for the Seewinkel.

Nevertheless, in more fortunate cases, a 2DdM can be highly informative. As an example, the dataset of the surface water monitoring system of Lake Neusiedl/Fertő from the original CCDA study of Kovács et al. (2014) is presented (Fig. 4). The 2DdM was produced from the sampling sites of the lake located in its diverse habitats (Magyar et al. 2013a). It was clear that the sites in the vicinity of the reed belt were significantly different to those in the open water, while the sites from the same habitat close to each other form homogeneous groups. Exceptions were sites 11 and 13, where the River Wulka, next to site 13 was responsible for the observed lack of homogeneity (Fig. 4). Thus, the 2DdM for Lake Neusiedl/Fertő itself presented a good example of how this kind of mapping can provide useful information in a situation where the scale of the difference values is much larger (here ranging from −10 to +20 %). In such a case, the map can be interpreted, and is capable of yielding excess information about the area investigated. This might serve as a good starting point when deciding where to set up new sampling sites.

Fig. 4
figure 4

2D difference map of the Western part of Lake Neusiedl/Fertő with sampling sites 11 and 13 marked

5 Conclusions

Sustainable and optimized monitoring is the basic requirement of water quality protection. It is vital that the data obtained from monitoring represent the water body in question as accurately and economically as possible. Specifically, (i) sampling sites should measure the whole area, (ii) they should not contain redundant information, and (iii) a clear picture should be obtained of where to set up new sampling sites if the option arises. To meet these aims, CCDA was used to classify sampling locations of various surface and sub-surface water bodies even operating under different monitoring strategies into homogeneous groups, leading to the possibility of optimization regardless of the number of sampling sites, inner processes, etc.

In the case of the river, CCDA was able to handle the circumstance of samples not being taken on the same day at the different sites. It was pointed out that it is unsatisfactory merely to determine the number and location of sampling sites based on the location of the mouths of the tributaries, and that effects of the branches or artificial installations should be taken into account as well. In the areas where only local characteristics could be held responsible for the lack of homogeneity (e.g. on a longer river section without inflows), a denser monitoring network is recommended.

In the artificially constructed and engineered KBWPS, with the exception of one pair, all of the sampling locations measure exclusively separate underlying phenomena. Using the 1DdP it became clear which areas should be considered first if new sites are to be set up. Lake Balaton as the next study area gave essentially the same results as a previously published study of Kovács et al. (2012b).

In the sub-surface water system in the watershed of Lake Neusiedl/Fertő, all the monitoring sites held exclusively different information. The big differences between the different sites are presumably the result of the large distances, in combination with some other driving factors (e.g. reservoir rock, topsoil, flow direction, etc.). The 2DdM, the spatial extension of the 1DdP, was shown to be useful in certain cases when a monitoring network is revised.

The results indicated which sampling sites hold redundant information, and which areas should be considered if new sampling sites are to be set up. In particular, the presented methodology can be applied successfully to monitoring networks of even larger scales, as also to other arbitrary eco— or environmental systems, where a common set of parameters is measured at different sampling locations multiple times.