1 Description of the research area

The Tahtalı dam is in the southwestern part of İzmir city, southwest Turkey (Fig. 1). This dam was constructed to supply about 4,055 l/s of drinking water from the Tahtalı River to İzmir city (Gülbahar 1999). The construction of the Tahtalı dam was completed in 1996. The topographical conditions were suitable for supplying clean water from the reservoir to İzmir city by gravity flow. A general view of the Tahtalı dam site is given in Fig. 2.

Fig. 1
figure 1

Location map of the Tahtalı dam site

Fig. 2
figure 2

General view of the Tahtalı dam site

The maximum reservoir capacity is 307,000 hm3, whereas the minimum capacity is 56,000 hm3 (Kılınç and Gülbahar 1997). The watershed of the Tahtalı dam covers a surface area of 515 km2, extending along the Cuma plain. The watershed is the water collection area of the Tahtalı River, which runs in the NE–SW direction, through the Cuma plain. About 50% of the area is covered with forests and 30% of the land is used for agricultural purposes, whereas, the remaining represents urban area (Atış 1999).

2 Hydrochemical evaluation of the Tahtalı watershed

Human activities near the Tahtalı dam site have had direct and indirect effects on the rates of contamination of surface and groundwater in the Tahtalı watershed area. Direct effects include dissolution and transport of excess quantities of fertilizers with associated materials and hydrologic alterations related to irrigation and drainage. Indirect effects include changes in water–rock reactions in soils and aquifers caused by increased concentrations of dissolved oxidants, protons, and major ions.

Agricultural activities have directly or indirectly affected the concentrations of a large number of inorganic chemicals in groundwater, for example NO3, N2, Cl, SO 2−4 , H+, K+, Mg2+, Ca2+, Fe, Cu, B, Pb, and Zn, as well as a wide variety of pesticides and other organic compounds. For reactive contaminants like NO3, it is suggested that a combination of hydrochemical and environmental-tracer analytical approaches might be required to resolve changing inputs from subsequent alterations as causes of concentration gradients in groundwater.

The results of hydrochemical analyses are given in Table 1.The main water type of the Tahtalı lake is Ca–Mg–HCO3. The river water type was directly related to the hydrogeochemical properties of the study area.

Table 1 Hydrochemical analysis of the Tahtalı lake site (mean annual values)

3 Material and methods

3.1 Statistical evaluation of hydrochemical data

Water quality data were collected from four sites shown in Fig. 3 for three years between 1996 and 2000 and some of the data were obtained from DSİ (General Directorate of State Hydraulic Works). The results were evaluated by using monthly averaged values of the parameters statistically. The SPSS and NCSS statistical programs were used in statistical analyses. Statistical analyses were carried out to approve the water quality changes regarding to relationship between different concentrations. Thus, the statistical analyses were set forth in two parts: in the first part, the summary statistics such as mean, median, minimum, maximum and 25th and 75th quartiles were calculated (NCSS 2004).

Fig. 3
figure 3

Tahtalı Dam site and main sample points

In second part, the data were investigated by using statistical methods such as median test, normality test, parametric and non-parametric correlation analyses and regression analyses. These methods were performed on water quality data taken from four sample sites to percieve any relations among them. Median test is applied to check if medians of water quality data from four sample sites (Dereboğazı, Bulgurca, Menderes and Gülbahar) are equal or not. The median of data from two or more samples can be used to help make decisions about the equality of the centers of samples. Comparing medians, rather than means, can yield a substantially different conclusion as to whether two distributions differ in terms of some measure of central location (Wilcox 1991). Since, the median is minimally affected by the magnitude of observations, being determined solely by the relative order of observations, this resistance to the effect of a change in value or presence of outlying observations is often a desirable property (Dennis and Hirsch 2002). Commonly a non-parametric test (distribution-free test) is used to compare two independent groups of sampled data. Since there are more than two groups in independent group comparison, Kruskal–Wallis test is used instead of Mann–Whitney U test. This non-parametric test makes no assumptions about the distribution of the data (e.g., normality).

There are several methods to test normality. Among many methods, three tests are chosen for the present study, these are; the Kolmogorov–Smirnov (K–S) test, the Anderson Darling test and the Shaphiro–Wilk test. The K–S test determines if two data sets differ significantly. The K–S test has the advantage of making no assumption about the distribution of data. K–S test is non-parametric and distribution free test, which means the critical values do not depend on the specific distribution being tested. Data were compared with normal distribution values and K–S test statistic is the maximum difference between the cumulative proportions of the observed distribution and expected cumulative–normal distribution.

In the present study, decisions were based on the comparison of tests statistics with 5% significance level of critical values. The Anderson–Darling test is also used to test if a sample of data came from a population with normal distribution. It is a modification of the K–S test and gives more weight to the tails than does the K–S test. The Anderson–Darling test makes use of the specific distribution in calculating critical values.

The application of each statistical methods in the present study is carried out to; check the normality of periodical field measurements, and discern the relationship of the water quality parameters in different areas regarding to different source of contamination. Thus, correlations were carried out in two groups; first group represents the correlations of each water quality data among sample sites and second group represents the correlations of sample sites parameters among water quality data (NCSS 2004).

First group of correlation was put forward to show the correlation of each water quality data between four sites, for example, correlation between the BOD of Dereboğazı with the BOD of other sites. Second group of correlation analyses was performed to show any dependency of water quality data with other water quality data for each site. For example, for Dereboğazı site the correlation of BOD with NO3, NO2, PO4, NH4 parameters and as well as the correlation of COD with Cu, Pb, and Fe were investigated. During correletion between sites, three correlation coefficients were investigated; a parametric Pearson correlation coefficient, nonparametric Kendall’s Tau_b and nonparametric Spearman’s rho correlation coefficients (SPSS 2004).

Pearson’s correlation coefficient is a measure of linear association. Pearson’s correlation coefficient is not an appropriate statistic for measuring their association, if the relationship is not linear between two variables. Since regression requires the normality of the data, in correlation and regression analyses, linearity and normal distribution is assumed in this paper. The high population and increase of industrial activities in İzmir city, play an important role in the water contamination problems. Therefore, the main objectives of the statistical application for each method depends on the relationship between different physical and chemical properties of the water quality in the dam site. For example; increase of BOD contents due to the increase of Nutrient Contents (N, P); or increase of COD contents due to the increase of heavy metales (Cu, Pb, and Fe).

Table 2 illustrates the median values of Nitrite, Phosphate, Fe and NH4 parameters are not equal (null hypothesis is rejected) on 5% significant level. Three normality tests (S–W, Shapiro–Wilk; K–S, Kolmogorov–Simirnov; A–D, Anderson–Darling) are applied to four sample sites and results are compared in Table 3.

Table 2 Kruskal–Wallis one way ANOVA on ranks test
Table 3 Normality test results in each water quality data for four sample sites

The table illustrates that for nitrate and nitrite values of all sites, there is not enough data to reject normality. Having the highest test values, normal probability plot of COD and Cu data of Gölcükler site along with 95% of confidence intervals are given in Fig. 4. Normality tests give the result as normality of the data cannot be rejected and it can be seen from the figure. Data follows the theoretical straight line of normal distribution very closely, hence normally distributed.

Fig. 4
figure 4

Normal distribution of COD and Cu data of Gölcükler site

Correlation value of the dependent variable is shown as one in Table 4. Dereboğazı site is taken as dependent variable except in COD (2) and Cu (2) correlation analyses. The analyses use Gölcükler site as dependent variable and show the correlation of other sites to Gölcükler site. Here, the plot of histogram for all variables is not normally distributed due to two main reasons; (1) the changes in amount of contaminants in different periods and type of contamination sources in each sampling area; and (2) changes in different non-relative parameters periodically. Therefore, the main reasons for testing the normality of the field data are; to prove the normality of the relative parameters, find out the periodical changes in the field measurements, and finally to clarify the distribution of periodical changes in water quality in different areas regarding to changes in these relative concetrations. The lower values of Kendall’s tau do not mean that tau is less sensitive than Pearson’s coefficient, but simply that a different scale of correlation is being used (Dennis and Hirsch 2002). Strong correlations were observed between sites and high significant levels are indicated by asteriks (*) and (**) for 0.05 and 0.01 levels, respectively.

Table 4 Correlations among four sites of water quality data

COD (2) analysis shows a very strong correlation between Gölcükler and Menderes sites and Gölcükler and Bulgurca sites. Second part of the correlation analyses was performed for water quality data within each site to show the dependency of pollution parameters. The correlation in this analysis is performed into two groups; correlation of BOD with NO3, NO2, PO4, NH4 and correlation of COD with Cu, Pb, F for each site and results are given in Tables 5 and 6, respectively. Table 5 shows a negative correlation between BOD and NO3, however, there is a positive correlation between BOD and NH4 values for Dereboğazı and Bulgurca sites. Table 6 shows correlations between COD and heavy metals. Regression analyses were carried out for the sites of significant correlations and linear models were constructed. Linear correlations were formed as;

$$ {\text{Site}}\;1 = A\,{\text{ + }}\,B*{\text{Site}}\,2 + C*{\text{Site}}\,3 + D*{\text{Site}}\,4 + ... $$

The model is a multiple linear regression model which is the extension of simple linear regression model with multiple explanatory variables. A is the intercept B is the slope coefficient for the first explanatory variable (Site 2), C is the slope coefficient fort he second explanatory variable (Site 3), etc.

Table 5 Correlation of BOD with NO3, NO2, PO4 in each site. BOD is taken as dependent variable
Table 6 Correlation of COD with Cu, Pb, Fe in each site. COD is taken as dependent variable

Some of the linear correlation models with constant values including R 2 values are given in Table 7. Since there is a significant correlation between BOD and NH4 at Dereboğazı and Bulgurca sites, a regression betweeen BOD and NH4 were sought and given below;

$$ \begin{aligned}{} & {\text{BOD}}_{{{\text{Bulgurca}}}} = - 1.4581 + 100.38*{\text{NH}}_{{{\text{4 Bulgurca}}}} R^{2} = 0.8798 \\ & {\text{BOD}}_{{{\text{Derebogaz $ \imath $ }}}} = 0.2592 + 48.884*{\text{NH}}_{{4\,{\text{Derebogaz $ \imath $ }}}} R^{2} = 0.6660 \\ \end{aligned} $$

For comparison, maesured values and regression line of BOD and NH4 parameters of Derebogazı site is given in Fig. 5. Water quality parameters were investigated statistically and strong relations were shown among four sites. The sample points were choosen as continuous of the river watershed. Dereboğazı is at the outlet of Tahtalı dam and Bulgurca is at the inlet of Tahtalı dam. Menderes and Gölcükler sites were remote points which feeding the inlet of Tahtalı dam. There are industrial activities reported near Menderes site and agricultural activities reported near Gölcükler site (Atış 1999).

Table 7 Regression analyses between sites for water quality parameters
Fig. 5
figure 5

Graphical representation of measured values and regression line of BOD versus NH4 for Derebogazı site

As a result of the study, the pollution on the sites and the effect of pollution to downstream can be estimated from the correlation coefficients in Tables 4, 5, and 6. Based on the correlation coefficient analyses, estimated models were constructed and given in Table 7. The water quality parameters and their effects can be detected and be estimated depending on the upstream parameters of the sites (Brooks and Thomas 1991).

4 Results and discussion

The statistical application is carried out for 3 years (36 months) with mean monthly values for evaluating the monthly changes in different parameters; approving the application of relevant statistical methodology in such environmental problem; and finally clarifying the changes of water quality parameters. The mean monthly values of water quality data for four sample sites were given in Fig. 6.

Fig. 6
figure 6figure 6

Graphical representaion of water quality at the dam watershed (1996–2000)

The following graphical presentation illustrate the periodical analyses of NO3, NO2, PO4, NH4, BOD, COD, Fe, Cu, Pb, phosphates and TDS contents for each site, contamination in terms of statistical methodology. Result of the analyses showed that influences of pollution due to industrial and agricultural activities is significantly taken place at Gölcükler and Menderes sites, where high fluxes of heavy materiales Fe, Cu, Pb and high nutrients contents. Here, some of elements are added to the hydrosphere directly as components of fertilizers or other additives, whereas other elements were associated indirectly with fertilizer loadings or land disturbance through enhanced rates of leaching of natural sources in soils.

Results showed that high nitrate concentrations might occur due to leaching of NO2, NO3, NH4 and PO4 from fertilizers and biocides during irrigation of agricultural land around the dam site, in January, February, and April. Moreover, Leakage from overloaded sewage networks are also sources of nutrients and heavy mineral pollution. The dam lake is highly contaminated by extreme NO2, NO3, NH4, heavy minerals and phosphate loads originating from irrigation drainage, sewage, and deteriorating septic treatment system.

The main sources of nitrate in the study area are: nitrate from the application of fertilizers in irrigation and from untreated wastewaters directly discharged to the lake. The Periodical changes in heavy metal concentrations; Fe, Cu, and Pb against water temperature and TDS showed that the indirect effects have important implications for geochemical studies of water–rock interactions and can represent sources or sinks for a variety of problematic contaminants of such heavy metals and/or toxic trace elements (Elhatip et al. 2004). Moreover, the median and quartiles of data in box plot were given in Figs. 7 and 8. In the box plot green and red dots show the mild and severe outliers, respectively. The Figures illustrate the median and quartiles of BOD, COD, PO4, nutrients and heavy mineral contents in each variable around the dam site.

Fig. 7
figure 7

Median and quartiles of BOD, COD and Nutrient contents

Fig. 8
figure 8

Median and quartiles of TDS, Cu, Pb and Fe contents

5 Conclusions

The statistical application for each method is mainly carried out to approve the water quality changes in the dam watershed. As a result of statistical analysis used in this study, the water quality parameters and their effects are detected and estimated depending on the upstream parameters of the sites. Here, the relationship between different physical and chemical parameters in the present study showed that the Tahtalı dam lake is highly contaminated by extreme N–P loading, especially during dry periods when eutrophication processes usually take place in static water masses. The process usually occurs due to the continuous increase of nitrogen and phosphorus and the decrease of O2 in the water. This may cause an anaerobic medium and stimulate the algae growth in the lake. Therefore, continuous pollution of the Tahtalı dam reservoirs may consequently reduce the quality of the lake.

Common agricultural practices have caused substantial increases in groundwater recharge fluxes and concentrations of major elements and a variety of other heavy metals. The increases are related to transient agricultural chemical signal in aquifers that have groundwater residence times of decades or more. Some elements are added to the hydrosphere directly as components of fertilizers or other additives. Others are associated indirectly with fertilizer loadings or land disturbance through enhanced rates of leaching of natural sources in soils. This leads to physical and biological changes and to increased acidity or ionic strength of agricultural recharge.

In addition, high fluxes of NH4 and NO3 in recharging groundwater have resulted in unnaturally high concentrations of chemical species released from aquifer materials by oxidation reactions in the saturated zone, because the electron demand of NO3 in agriculturally contaminated recharge is commonly several times higher than that of dissolved O2 in uncontaminated recharge.

In cases of increasing N and P loads being greater than normal conditions, dramatic quality changes will occur in the water mass. Moreover, identifying the quality and quantity of water pollutants originating from non-point sources for solving water pollution questions, is a difficult and complex problem. Since the analyses were carried out for 3 years (36 months) data, which may not be enough to determine the environmental impact of pollution to the watershed, it is necessary to collect more data in the coming years at the same stations fort he validity of derived equations. Therefore, the only real solution to the environmental impacts of pollution is the determination of protection zones and implementation of good land use planning to maintain control of the activities in watersheds.