Introduction

Water quality is affected by point sources of pollution such as wastewater treatment plants and non-point sources such as farmlands (Kibena et al. 2014). However, the impact of individual factors, in terms of their specific contributions to the quality of water, is uncertain (Zhang et al. 2009; Wang et al. 2013). Domestic and industrial wastewater discharges constitute a constant point source of pollution, whereas surface flow changes depend on seasonal conditions (Singh et al. 2004; Shrestha and Kazama 2007).

The water quality of freshwater bodies is assessed in multiple ways, based on physical, chemical and biological processes. In recent years, various techniques have been developed to assist in the management of water quality, including mathematical models, optimised approaches and integrated decision-support systems (Huang and Xia 2001; Taner et al. 2011).

The largest areas affected by environmental pollution are water resources. In order to obtain a good understanding of spatial and temporal changes in water resource characteristics, it is necessary to have data indicating the quality of the water, and this should be gathered on a continuous basis. In this way, it will be much easier to assess the conditions of water resources and to develop solutions should problems occur (Arslan and Yildirim 2011). For the assessment of water quality data, each parameter has been defined in a separate regulation and standard. A comparison of water quality parameters obtained in the field with limit values that are set according to those standards leads to a conclusion as to whether the water is fit for its intended purpose. Making such an assessment for each and every parameter may reveal that the water under examination is usable for some parameters but unusable for others. As such, a shift from the conventional method, in which parameters are considered individually, towards an assessment that follows a multi-parametric approach, and where many parameters are assessed at the same time, would in most cases furnish more appropriate results. Such an assessment is critical, particularly for the classification of the quality of water in natural environments (Barıs et al. 2007).

The common technique used to interpret the quality of river waters is a univariate procedure that fails to characterise the similarities and variations between samples and variables (Dixon and Chiswell 1996). Reducing the quantity of environmental data, in combination with a multi-component interpretation of physical and chemical measurements and the application of a descriptive data analysis, would make the procedure much easier (Wenning and Erickson 1994). The generic term multivariate statistics is used to define the statistical methods employed for the analysis of two or more variables at the same time (Shaw 2003). In a multivariate statistical analysis, there are many interrelated parameters, and in order to evaluate the system, there are more-than-one unit and variables that interact with one another. As a result, there is an apparent need for a multivariate statistical analysis method (Ünlükaplan 2008). There have been many studies that have adopted a multivariate statistical analysis method (Bulut et al. 2010; Mahmood et al. 2011).

The purpose of this study was to perform multivariate statistical analyses on the big data obtained from the water quality assessments performed twice annually for 24 parameters over a 3-year period between 2011 and 2013 by the DSI at four different points located on the Coruh Basin in the Eastern Black Sea Region of Turkey, and to perform water quality analyses based on these data. By using multivariate statistical analysis, we interpreted the obtained data set, determined the similarities and differences between the sample collection points, identified the physicochemical parameters affecting water quality and determined the potential sources of pollution (natural, industrial and anthropogenic).

Materials and methods

Study area

The River Coruh originates in the Bayburt Province within the territorial borders of Turkey and follows its main course for 431 km, the last 20 km of which is in Georgian territory, where it discharges to the Black Sea (Sucu and Dinc 2008).

Monitoring sites

The data used in this study was supplied by the DSI, which took samples from four different points along the River Coruh; the Regional Directorate of Trabzon Province carried out the analyses. The first sampling point was at the site of the Muratli Dam on the River Coruh, close to the Turkish border. The Muratli Dam is a cofferdam located 100 m from the Turkey-Georgia border. The site is 2 km upstream from Muratli Village and 17 km downstream from the district of Borcka. The dam is used for flood protection and energy generation, while the reservoir occupies an area of 4.1 km2 and has a volume of 74.8 hm3. The second sampling point is at the mouth of the Kabaca River, where the Black Sea copper companies of Murgul Stream discharge their wastewater. The third point is the Borcka Dam, which is situated 2.5 km upstream. The reservoir occupies an area of 10.8 km2 and has a volume of 418.9 hm3. The fourth sampling point is the mouth of the Kabaca River, where the Black Sea copper companies on the Murgul Stream discharge their wastewater. Murgul is a district of Artvin Province, known since 1973 for its large-scale copper processing and production. Figure 1 shows the study area and the points of measurement.

Fig. 1
figure 1

Study area, surface flow quality and sampling stations

The water analysis points and the numbering scheme to be used in this article are given below in Table 1.

Table 1 Water analysis locations and names

Data preparation

This study makes use of the measurement data obtained from four sampling points on the River Coruh over 3 years (2011–2013) and every 6 months for 24 water-quality parameters by the DSI. Water samples taken from the observation points by DSI personnel are delivered to the Turkish city of Trabzon the same day, to be analysed at the DSI laboratories in Trabzon. DSI laboratories are certified by the Turkish Accreditation Agency (TURKAK), and the test reports are recognised by the European Accreditation Cooperation (EA) and the International Laboratory Accreditation Cooperation by mutual agreement. The said water quality parameters included temperature, pH, electrical conductivity (EC), total dissolved solids (TDS), sodium (Na), potassium (K), total hardness (TH), magnesium (Mg), bicarbonate (HCO3), total alkalinity (T-Alk), Chloride (Cl), sulphate (SO4), ammonia, nitrite, nitrate, orthophosphate, dissolved oxygen, biological oxygen demand, permanganate, turbidity, total phosphorus, suspended solids and colour. The water quality parameters and abbreviations are given in Table 2, while Tables 34 provide a summary of the basic statistical evaluation.

Table 2 The water quality parameters associated, their abbreviations and units used in this study
Table 3 Annual concentrations (mean values with standard deviations, SD) for water quality as well as comparison with environmental quality standards for surface water in Turkey
Table 4 Statistics of sampling results (mean values with standard deviations, SD) along Coruh Basin as well as comparison with environmental quality standards for surface water in Turkey

Data treatment and multivariate statistical methods

The Kolmogorov-Smirnov (K-S) statistics test is used to confirm the normal distribution of data, with a 95 % or higher confidence interval, indicating normal distribution. In order to show the compatibility of the data with PCA, Kaisere-Meyere-Olkin (KMO) and Bartlett’s tests are applied. The KMO test is used to verify that the sample is sufficient for analysis with a score between 0 and 1. The closer to 1 the result, the more suitable the sample is for factor analysis. The KMO value obtained at the end of the statistical analysis of the water quality is equal to 0.54, indicating suitability for PCA. The Bartlett test is used to determine whether there is sufficient correlation among the variables. In tests, the null hypothesis is that all correlation coefficients are equal to zero and if, at the end of the statistical analysis, p <0.05 is obtained, then the dataset is deemed suitable for factor analysis. The water quality tests resulted in p values equal to 0, and as p <0.05, it could be concluded that it was suitable for factor analysis.

An elaborate analysis of sequential multiple variables may render the evaluation of data difficult since separate individual analyses of the variables disregard the relationship that exists between them. There is an expectation that there should be a greater or less relationship between multiple variates, and it was for this purpose that methods of multivariate analysis have been developed (Arslan and Yildirim 2011). Multivariate statistical methods have proven to be useful for the interpretation, modelling and classification of large datasets, assessments of water quality and the management of surface flows (Simeonov et al. 2003). This study has made use of one-way ANOVA and principal component analysis (PCA)/factor analysis (FA), with Statistical Package for the Social Sciences, SPSS-20 software used to analyse the data.

Analysis of variance (ANOVA)

Analysis of variance is a method used to decide statistically which effects are applicable to one answer at the same time (Vega et al. 1998) and to test whether there is a statistically significant difference. In this case, the objective was to identify the factor(s) that was/were influential on the experiment (Celik 2012).

Principal component analysis

PCA summarises the statistical correlation between the minimum loss in original information and the water-quality parameters (Helena et al. 2000). A factor analysis involving PCA is used to degrade the size of the multiple interrelated datasets, while preserving them as much as possible. This reduction in size is achieved by converting orthogonal (non-correlated) and primary components, in descending order of importance, into a new dataset. Mathematically speaking, the primary components (PCs) are calculated from the covariance, which explains the distribution of recurrently measured parameters, to obtain the eigenvalue and eigenvector. The primary components are linear combinations of original variates and eigenvectors (Wunderlin et al. 2001; Singh et al. 2004). Varifactors (VFs) are obtained by rotating the axis defined with PCA, although factor rotation does not alter the basic mathematical properties of the solution. Following the rotation of the axis, a weighting of items on one factor shows an increase, while their weights in other factors decline. Thus, the factors identify the items that have a high correlation with themselves and so can be interpreted much more easily (Tabachnick and Fideli 2001; Büyüköztürk 2002).

Results and discussion

Summary of water quality

The annual and seasonal changes in water quality parameters are evaluated in Table 3, according to the mean and standard deviation data. Water quality parameters were evaluated according to the Turkish Water Pollution Control Regulation, published in the Official Gazette dated 31 December 2004 and numbered 25687; the limit values are listed in the Turkish Standard (TS) 266 (TSE 2005); and the European Directive 75/440/EEC. An evaluation of the annual mean values with respect to Turkish standards revealed that, between 2011 and 2013, measured pH values varied between 7.15 and 8.34, electrical conductivity (EC) varied between 5.8 and 117.1 μS/cm, and total dissolved solids (TDS) varied between 35.9 and 878.3. These measurements also indicated that the water samples ranked as grade II in terms of water quality. Results also indicated that the SO4 concentration varied between 10.40 and 1269 mg/L during the 3-year period, with the mean SO4 concentration for 2011 being determined as 237.1 mg/L. The SO4 concentration for 2011 corresponded to a grade II water quality level, and grade III according to the limit values of the regulation, while the maximum SO4 concentration corresponded to a grade IV water quality level. NH4 took a value of 0.24 mg/L in 2011 and 0.28 mg/L in 2012, qualifying the water quality as quality grade II and a value of 2 mg/L in 3 years maximum, qualifying the water quality as grade III. NO2 readings were recorded at a concentration of 0.62 mg/L on average for 2011, and reached a peak value of 3.78 mg/L, making the water eligible for quality grade IV. Turbidity, total phosphorus and colour measurements took higher values than the limit values during 2011, 2012 and 2013, when the water was assessed as quality grade IV.

An evaluation of the data on water quality from the measurement point outputs a SO4 measurement result at the Murgul Stream Black Sea Copper Company’s downstream position, revealing a 241.85 mg/L mean, which ranks the water quality at grade IV. NO2, total phosphorus, colour and turbidity were evaluated to be high, showing a grade II water quality at every measurement point. Turbidity varied from between 0.87 and 1000 NTU, a level well above the limits in 2011, 2012 and 2013. The highest figure was recorded at the copper mine outlet. The total phosphorus parameter varied across a range of 0.19–0.31 mg/L, ranking the quality of water at grade II. The mean value of total phosphorus measured at the mining site exit of 0.27 mg/L ranks the water at quality grade IV, from which it can be determined that the copper mine is a significant factor affecting the water quality.

According to European Directive 75/440/EEC, the mean annual measurements of the water quality parameters between 2011 and 2013 were between 35.9 and 878.3 mg/L for TDS, between 10.4 and 1269 mg/L for SO4, between 0 and 2.00 mg/L for NH4, and between 2.5 and 71 PT-Co for colour. Based on these values, the samples were considered grade III water quality. An evaluation of the water quality values at the sampling points indicated that the TDS and colour values were above the limit values. The SO4 concentration was 241.8 mg/L downstream of the Murgul Stream Black Sea Copper Company and 167.7 mg/L upstream of the Murgul Stream Black Sea Copper Company. Based on these SO4 values, the samples were considered grade III water quality. The BOD value was determined as 3.16 mg/L at the Coruh River Muratli Dam sampling location, as 3.1 mg/L at the Coruh River Borcka Dam sampling location and as 3.3 mg/L at the sampling location downstream of the Murgul Stream Black Sea Copper Company. The samples were therefore considered as having grade II water quality.

Spatial variation of water quality

The box-plot graph allows the differences between the measuring points to be identified and visually summarises the quantitative data. The straight line over the box represents the median. The dots in the lowermost and uppermost positions show the Q1 and Q3 value readings. Whiskers show the highest and lowest observed values within the zone, as a line extending towards the lowest and highest points of the box, which are expressed as Q1 − 1.5 (Q3 − Q1) and Q3 + 1.5 (Q3 + Q1). The individual points falling outside the limit values are represented with an asterisk (Vega et al. 1998).

The box-plot graphs generated for the water-quality data are shown in Fig. 2. According to the box-plot graphs, significant differences existed between specific water quality parameters at different sampling points, while there seemed to be less difference between others. It can be observed that a minor difference existed between the mean values of pH, NH4, NO2, PO4, DO and total phosphorus, and no reading goes beyond the limit values. As regards the EC and TDS, SO4, TH, Turb, SS, Ca, K and Mg, it can be observed that these parameters obtained higher values at first measurement point than at the third and fourth points, that the data was distributed more widely, and also that some measured values exceeded the limit values. BOD, on the other hand, shows the measurement results crossed over the limit value at the first measurement point, while the measured values at the second and third measurement points seemed to have a higher spanning spectrum. It was further observed that the colour parameter gave close values at the first and fourth measurement points, and that data propagation became wider at the second and third measurement points and almost reached the upper limit at the second measurement point, while decreasing towards the lower limit at the third measurement point. A difference was also seen in the T-Alk and HCO3 values across the different measurement points, which occasionally exceeded the limits, and higher readings were observed at the first and third measurement points than at the second and fourth, with a greater spread of values in the former when compared with the latter. Our observations included a wide spreading spectrum in EC, TDS, Ca and total hardness, which retained a level close to the limit value, with limit-exceeding values and a broad dispersion for the K, MnO4, SS, SO4, TP and Mg parameters at the second measurement point. It was also observed that the quality of the water is much influenced by the fact that Black Sea Copper Companies are downstream from the second measurement point.

Fig. 2
figure 2figure 2

Box-plot graph for water quality parameters (1 Coruh River Muratli Dam Location, 2 Murgul Stream Blacksea Copper Companies Downstream Position, 3 Coruh River Borcka Dam Location, 4 Murgul Stream Blacksea Copper Companies Upstream Position)

Evaluation of data on water quality

ANOVA is used to test whether the mean values of more than two independent groups vary with respect to one another. According to the one-way analysis, carried out to distinguish between the differences in the water quality parameter at the different measurement points, the pH, EC, TDS, K, TH, Ca, HCO3, T-Alk, turbidity, SS and colour parameters were p <0.05, indicating a statistically significant difference between the relevant readings at the four points (Table 5).

Table 5 ANOVA for water quality in each sampling time: Artvin/Turkey

Further analysis was carried out to identify the points where these parameters varied, the results of which are shown in Tables 67. The p <0.95 conclusion at the end of the ANOVA test indicated a significant difference between the first and second measurement points in the readings of nine parameters (pH, EC, TDS, K, Ca, TH, HCO3, T-Alk), and a significant difference was also observed between the first and fourth measurement points in the values of five parameters (pH, Na, K, HCO3, T-Alk). These results suggest statistically significant differences between the second and third measurement points in the readings of seven parameters (pH, EC, TDS, K, TH, HCO3, T-Alk), between the second and fourth measurement points in the readings of eight parameters (EC, TDS, Turb, SS, Colour, K, TH, Ca), and between the third and fourth measurement points in the readings of three parameters (pH, HCO3, T-Alk). Between the first and third measurement points, however, no significant difference was observed in the parameters. It can be concluded that the lack of a significant difference in water quality between the Muratli and Borcka Dam sites indicated a similarity in quality, although it can also be said that the copper operations have had an adverse effect on water quality.

Table 6 ANOVA analysis results of water quality parameters
Table 7 ANOVA for water quality in (p < 0.05) statistical different parameters and locations: Artvin/Turkey

Principal component analysis

Screen-plot graphs are used to identify the number of PCs of the basic data structure (Liu et al. 2003). The vertical axis of the graph represents the eigenvalues, while the horizontal axis shows the factors. The graphs are obtained from a combination of plots, based on the mapping of the eigenvalues of the factors. The factor that shows highly accelerated, fast declines on the graph gives the number of the important factors. The horizontal lines show that the individual contributions of the additional variances brought by the factors come close to one another (Büyüköztürk 2002). A total of five primary components were identified with eigenvalues greater than 1 in the primary component analysis, and it can be observed that the curve did not appreciably change after the fifth primary component (Fig. 3).

Fig. 3
figure 3

Scree plot of the eigenvalues

FA is used to establish the factors for water samples going through analysis and for normalised data (Singh et al. 2004). The results of the PCA, applied to the 24 parameters at four measurement points, are given in Table 8. As a result of the factor analysis, five factors were identified with eigenvalues greater than 1, a ratio to the total variance of which showed a continual drop (Table 8). These five factors explain 81.3 % of the total variance.

Table 8 Varimax rotated factor matrix for the whole data sets

The first factor (VF1) explains 34 % of the total variance, where EC TDS, K, total hardness, Ca, Mg, SO4 and turbidity have strong positive weights while the SS medium has a positive weight. These figures lead to the conclusion that the primary factor is of mineral and industrial origin (Arslan and Yildirim 2011; Vega et al. 1998). Sulphate-rich and alkali rocks and chemical compounds with sulphate concentrations found in factory wastes, as well as sulphate-containing fertilisers mixing with the water, constitute the primary sources of sulphate in the water (Atay and Bulut 2005). This study considers mining operations to be the source of sulphate. The second factor affecting the water quality (VF2) explains 16.9 % of the total variance, where Na, HCO3, NH4, NO2 and PO4 have a medium positive weight. This effect can be explained as resulting from non-point sources such as farmlands and/or domestic sources (Simeonov et al. 2003). HCO3 can be viewed as an indication of fresh water (Arslan and Yildirim 2011). The third factor affecting water quality (VF3) explains 13 % of the total variance, where colour has medium positive weight and pH and total alkalinity have medium negative weights, attributable to the physical and chemical properties of the water. The reduction in the pH value of the water is a result of the hydrolysis of acidic substances (Vega et al. 1998).

The fourth factor affecting water quality (VF4) explains 9.5 % of the total variance, where Cl, NO3 and TP have a medium positive weight and MnO4 a low positive weight. Chlorine is an indication of water salinity, while NO3 and total phosphorus can be suggested as an indication of domestic usage. The fifth factor (VF5) explains 7.8 % of the total variance, in which temperature and BOD have medium positive weights, and DO a medium negative weight. This factor can be defined as having derived from anthropogenic pollution sources (Vega et al. 1998; Singh et al. 2004).

Conclusions

An analysis of water quality is a complex procedure; carrying out multivariate statistical treatments would make interpretation of the data easier (Singh et al. 2004). This study has investigated the effects of different point sources in the Coruh Basin on the variations in water quality, employing ANOVA, principal component and factor analyses as the multivariate statistical methods.

The box-whisker plot analysis presents a strong model analysis tool not only for evaluating the sources of the variables but also for assessing the parameters associated with these sources. The box-whisker plot analysis indicated that the EC, TDS, SO4, TH, turbidity, SS, Ca, K and Mg values were higher at the sampling location downstream of the Murgul Stream Black Sea Copper Company than at the other sampling points, and that the study data had a wide distribution.

From the results of the primary statistical evaluation, it can be concluded that the water samples analysed should be categorised as quality grade III based on SO4 values; of grade III based on NO3 values; and of grade IV according to the results of the measurements conducted for NO2, turbidity, total phosphorus and colour, in line with the Water Pollution Control Regulations published in the Official Gazette on 31 December, 2004 No. 25687, and according to the limit values given in TS 266. With respect to the European Directive 75/440/EEC, the statistical analysis of the annual mean water quality values indicated that between 2011 and 2013, the water quality of the collected samples would have been classified as grade III with respect to their TDS, SO4 and colour parameters. Evaluation of the water quality data between the different sampling points showed that the samples collected at all locations had grade III water quality with respect to their TDS, SO4 and colour parameters. According to the measured BOD values, on the other hand, the water samples qualified as grade II in terms of water quality.

One-way ANOVA analysis was performed to compare the differences between the different sampling points for each parameter, while the Tukey test was used to identify the sampling locations where water quality parameters were significantly different than the other locations. The latter analysis was important in that it illustrated the water quality differences between the different sampling locations. The results of the ANOVA test revealed that the four measurement points were all different, given the value of p <0.05 attained for the pH, EC, TDS, K, TH, Ca, HCO3, T-Alk, turbidity, SS and colour parameters. The qualities of the water measured at the Muratli and Borcka dam sites would appear to be similar, which can be attributed to the sources’ being domestic and agricultural in origin. The results of the study point to the conclusion that there is a statistically significant difference between the water characteristics at the sources of wastewater from the copper companies and those of the other measurement points. From the result of the PCA/FA, it can be concluded that the quality of the water is significantly affected by the copper mining operations and domestic and industrial use, as well as the specific physical and chemical properties of the water.

The principal component analysis (PCA) and factor analysis (FA) were used to separate into different groups the parameters that determined the latent factors affecting water quality. By applying varimax rotation to the results of these analyses, we determined that copper mining activities, the presence of domestic and industrial wastewaters, and the physicochemical characteristics of water affected water quality.

The results of this study demonstrated that the use of multivariate statistical analysis methods such as ANOVA as well as PCA/FA is suitable for interpreting complex water quality data sets, understanding the time-/location-related changes in water quality and determining hidden sources of pollution. The outputs of this study will therefore contribute to determining the main sources of pollution in different regions within the framework of activities aiming to improve river water quality. Multivariate statistical methods can be effectively used in interpreting large data sets and determining environmental impacts within the scope of river water quality monitoring programs.