Introduction

Environmental protection has gained considerable importance in the background of the negative impact of harmful environmental changes on humans. Anthropogenic activities have led to the worldwide deterioration of not only air quality but also water quality (Li et al. 2014; Purushotham et al. 2011). The carbon emission is caused by coal burning including indoor and outdoor, which accelerates the greenhouse effect. It leads to climate change on the environment as well as on forest area, coastal area, and urban area (Cetin 2016; Cetin et al. 2017; Sevik et al. 2017). In recent years, considering the severe pollution arising from the use of coal, its use has been reduced all over China, in particular in several cities. However, mining cannot be completely stopped, and the hazards encountered during mining, such as mine collapse, water inrush, discharge, and contamination, should be dealt with carefully. Among these hazards, inrush, discharge, contamination, and solution methods are closely linked to water sources.

Various methods are available for solving problems related to water sources based on element characteristics. Hydrochemical analysis is usually the first step in dealing with water problems, such as the source identification of water inrush and contamination and the assessment of water quality. Examples of these methods are fuzzy evaluation, multivariate statistical methods, distance discriminant, Bayes discriminant, and isotope tracking (Aravena et al. 1993; Dinka et al. 2015; Wang et al. 2016; Wu et al. 2017; Zhang et al. 2009). Hierarchical cluster analysis (HCA) is a powerful tool for grouping samples into significant clusters. Hence, this method was used to test water samples and check whether the samples could be grouped into specific groups. The main function of PCA is to reduce a large number of variables into a few underlying factors to explain the variability of group characteristics (Yang et al. 2016a, b).

Geochemistry characteristics of water depend on water–rock interaction processes, which are controlled by groundwater dynamics and lithology, etc. (Wolkersdorfer 2008). Therefore, chemical analyses are commonly utilized for water-related research such as water quality assessment, water-contaminant source identification, and mineral water studies. Among such analyses, hydrochemical analysis is the fundamental method because hydrochemical characteristics can reflect the occurrence conditions. The need for a greater understanding of the chemical composition of water sources, particularly for elements that are not monitored on a regular basis, is the key issue for defining their quality. Hydrochemistry is widely used for water source identification and water quality assessment, for example the use of stable isotopes and radioisotopes (Ji et al. 2017; Tallini et al. 2014), studies on the hydrogeochemical processes in groundwater (Armengol et al. 2017; Yang et al. 2016a), studies on the spatiotemporal variety of groundwater (Singh et al. 2015), and identification of pollution sources of groundwater or surface water (Datta et al. 2011; Telci and Aral 2011; Yang et al. 2016b). Piper trilinear diagrams and ion ratios are commonly used for these studies.

Mine water inrush may originate from a variety of sources such as sand water, limestone water, surface water, and stratifugic water. The coal-bearing strata belong almost completely to the Permo-Carboniferous system in North China, and with the increase in deep mining, the main controlling factors of water hazards related to limestone water become more important (Meng et al. 2012; Tripathy and Ala 2018; Wu et al. 2017). Hydrogeochemical characteristics vary with rock–water reactions such as leaching and mixing; therefore, these characteristics are directly related to the lithology (Ghesquière et al. 2015; Frape et al. 1984). If the hydrochemical method is used for water source identification, different lithologies can be identified, but water originating from similar lithologies (such as limestone water, particularly) cannot be as easily identified. Hence, the choice of the method for water source identification is crucial to mine safety.

The selection of parameters in hydrochemical analyses plays an important role in solving water-related problems. PCA can be combined with other methods to solve research problems such as water source identification (Zhang et al. 2012; Lu et al. 2012). Moreover, the unit or parameter selected differs from various studies (Howladar 2017; Huang et al. 2017; Wen et al. 2014; Zhou et al. 2010). Although different methods have been compared, no comparison of the results obtained using different units is available. Unlike the unit milliequivalents per liter (mEq/L), the unit milligrams per liter (mg/L) indicates absolute concentration. Therefore, theoretically, using milliequivalents per liter is more appropriate when solving certain problems as it can also indicate the characteristics of other elements present.

This study focuses on the research results obtained using a hydrochemical analysis method, either by itself or in combination with PCA, and the selection of parameters and units. The major objective is to study the identification accuracy of similar limestone water or similar sandstone water. Descriptive statistics, contrastive analysis, and statistical methods are used in this study to determine the influence of different methods, parameters, and units on identification results. PCA was used to reduce the number of chemical parameters, and HCA was used to test the sample grouping. The hydrochemical analysis result shows that the Piper and Stiff diagrams are useful for presenting sample characteristics, and the hydrogeochemical types are SO4-Na·Ca, SO4-Na, SO4·HCO3-Na, and HCO3-Na. Further, other conclusions were drawn to explain the results under different conditions.

Methods and materials

Study area

The Xinwen coalfield is located in Tai’an, Shandong Province, China (Fig. 1). It lies on the axis of the Xinwen syncline; therefore, the faults are well developed here. Statistically, there are 23 medium–large faults in this area and numerous small faults. The strata in the study area are terrestrial and oceanic mutual coal-bearing deposits of North China Permo-Carboniferous type. The average thickness of the confined limestone aquifers is less than 10 m, but the Ordovician limestone is about 800 m. The limestone serves as the roof aquifer or the floor aquifer and can easily act as direct water inrush sources. Some water flow records of the Ordovician limestone over the past 4 years are presented in Table 1.

Fig. 1
figure 1

Geographic location of the Zhaizhen coalmine

Table 1 Water flow records over 2010–2013

Water sampling and hydrology analysis

Limestone water was sampled from the Ordovician limestone and Xujiazhuang limestone aquifers, and sandstone water was sampled from the Lower Jurassic sandstone and the roof sandstone of the lower seam no. 3 in the Jining coalmine. The physical parameters and major ions were determined for statistical evaluation.

The water samples were collected and scattered the research aquifers based on the drilling distribution in mine area. Some physical and chemical indices such as water temperature, pH, and conductivity can be measured on the spot by sensors. And then, the filtered water samples were sent to the testing center for sample analysis. The anions were tested by an ion chromatograph, and the cations were tested by an inductively coupled plasma emission spectrometer.

The Piper trilinear diagram (Piper 1944) is one of the most useful graphical representations in groundwater quality studies. This diagram helps one to understand the geochemical characteristics of groundwater. The Durov diagram improves upon the Piper trilinear diagram and incorporates total dissolved solids (TDS) and pH. The geochemical characteristics of groundwater can be clearly seen by plotting the cation and anion concentrations in the Piper trilinear diagram.

The Stiff diagram is also a graphical representation of chemical analyses (Stiff 1951). It is used to display the major ion composition of a water sample. A polygonal shape is created from four parallel horizontal axes extending on either side of a vertical zero axis. Cations and anions are plotted on each side of the zero axes. Stiff diagrams are useful in making a rapid visual comparison of different water sources. They can help in determining flow paths or showing changes in the ionic composition of a water body over space or time.

Statistical analysis

The multivariate method was used to deal with sample data and compare four types of water. To compare the water groups and different methods, PCA and HCA were used. With HCA, the samples can be grouped into significant clusters effectively based on the data of chemical parameters. Therefore, HCA was used to test the water samples in this study. PCA is useful for dimensionality reduction and was used to determine the various sources between parameters. SPSS 21.0 was used to analyze water samples and perform HCA and PCA calculations. The details of PCA are given below.

  1. Step 1:

    Normalize the original dataset by computing \( \mathbf{x}-\overline{\mathbf{x}} \), where x is a d × 1-dimensional vector representing one sample data and \( \overline{\mathbf{x}} \) is the d × 1-dimensional mean vector of the whole dataset.

  2. Step 2:

    Compute the covariance matrix of the normalized whole dataset.

  3. Step 3:

    Compute the eigenvectors (ν1, ν2, …, νd) and the corresponding eigenvalues (λ1, λ2, …, λd) for the covariance matrix such that ∑v = λv.

  4. Step 4:

    Sort the eigenvectors by decreasing the eigenvalues and choose the top k eigenvectors to obtain a d × k-dimensional matrix V consisting of the chosen eigenvectors.

  5. Step 5:

    Project the original dataset into the new subspace by computing y = Vx, where y is the transformed k × 1-dimensional sample in the new subspace.

Results

Hydrochemical characteristics

The hydrochemical characteristics of Ordovician limestone water (I), Xujiazhuang limestone water (II), Lower Jurassic sand water (III), and sand water in the roof of the lower seam no. 3 (IV) as obtained using the descriptive statistical method are listed in Tables 2 and 3. Graphical representations of the chemical analyses of the four types of water are shown in Fig. 2, which contains the Piper, Durov, and Stiff diagrams.

Table 2 Hydrochemical characteristics of limestone water
Table 3 Hydrochemical characteristics of sandstone water
Fig. 2
figure 2

Graphical representation of chemical analyses. a Piper, b Durov, and c Stiff diagrams

The listed data and analyzed results show that the major cations of Ordovician limestone water and sand water are dominated by Na and Ca. The major anion of the two types of limestone water is dominated by SO4. The major anions of Lower Jurassic sand water are SO4 and HCO3, and the leading anion of sand water in the roof of the lower seam no. 3 is HCO3. The major cations contribute 28.53%, 25.85%, 19.81%, and 26.99% of the total dissolved solids of water types I–IV, respectively. The major anions contribute 59.18%, 60.95%, 59.53%, and 65.58% of the total dissolved solids of water types I–IV, respectively. The standard deviation and coefficient of variation reflect the dispersion degree of the data.

There is a slight difference between the two types of limestone water. Limestone is a carbonate deposition, and the water–rock interactions are closely related to the lithology, such as dolomitic limestone or simply limestone. Comparison of the two types of sand water shows that the two differ even though they originate from sand aquifers. The differences arise depending on time, channels, temperature, etc., and these differences can affect the rock–water interactions. The percentage of sodium and the sodium adsorption ratio can be used to assess water quality.

To identify the four types of water clearly, Table 4 presents the sample analysis results: water type, conductivity, salinity hazard, sodium adsorption ratio (SAR), and exchangeable sodium ratio (ESR). Table 4 also lists the features of the four types of water: the two limestone water samples differ in water type and conductivity, and the two sandstone water samples have clearly different water types, SAR, and ESR. Thus, the two sandstone water types are easily distinguished, whereas discriminating between the two limestone water types is comparatively difficult. Limestone water and mixed water identification still remains a challenge for mine water inrush source identification.

Table 4 Data analysis of the four types of water

The data in Tables 3 and 4 cannot provide any visual information about the water samples. The hydrochemical analysis results presented in Fig. 2 clearly show the differences between the water types. The water quality of different sandstone aquifers is different. It is also more easily distinguished from the water quality of limestone aquifers. Limestone water from different limestone aquifers can be distinguished on the basis of the Durov and Stiff diagrams.

Hierarchical cluster analysis

HCA is used to group all the water samples into several significant different clusters. As it is known that there are four types of water, HCA can be used to test the water sample data and determine whether the samples can be grouped into hydrochemical groups. Therefore, samples that are improper can be filtered before analysis to reduce the errors caused by improper data.

Figure 3 shows the dendrogram generated using HCA. There are five exceptional water samples (1, 3, 30, 41, and 43): samples 1 and 3 should belong to I, and samples 30, 41, and 43 should belong to III. Furthermore, the following information is derived: group 1 includes all six water samples of type IV, group 2 includes 17 water samples of type III, and the other three samples, namely 30, 41, and 43, are excluded. For the limestone water samples, four samples are displaced and error grouped. When the cluster standards of HCA are known, samples 1, 3, 30, 41, and 43 can be removed from the total water samples for hydrochemical analysis.

Fig. 3
figure 3

Dendrogram generated from the HCA of water chemistry data

Principal component analysis

PCA is a statistical technique useful for finding patterns in data represented in high dimensions, and the main process of PCA is to find the directions that maximize the variance in the dataset (Oh and Hildreth 2016; Verma 2013). As a multivariate data analysis technique that mainly studies the inter-structure of the correlation matrices of parameters, its fundamental purpose is compressing the original data to achieve dimensionality reduction. Therefore, combinations of PCA and other mathematical methods are used to solve a variety of problems.

PCA was used to reduce the number of water chemical parameters and to choose the principal component as a new evaluation index system for hydrochemical analysis. One objective of the study is to check whether the results differ from each other when different parameters are chosen. Therefore, two parameter standards are chosen to conduct PCA. An example of one case with parameters was used to understand the process. Two components were extracted from this process. The eigenvalues were 3.97 and 1.07, and the cumulative variance explained by the two components was 84.03%, and these two components explain 66.18% and 17.85% of the total variance, respectively. The component matrix and the rotated component matrix are presented in Table 5.

Table 5 Component matrix and varimax rotated component matrix

The first component was correlated with the ion concentrations of Ca, Mg, and SO4. The second component was correlated with the ion concentrations of Na + K and HCO3. The eigenvector of the principal component was constructed based on a component matrix, and then the expression of principal components was derived.

Discussion

Identification with different parameters

In this study, two parameter groups were chosen to study the influence of parameter selection on the analysis results. One group (G1) contained seven major ions (Na + K, Ca, Mg, Cl, SO4, and HCO3), and the other group (G2) contained parameters of Na + K, Ca, Mg, Cl, SO4, HCO3, Fe, NH4, TDS, and pH. Samples that excluded the four particular cases were used to carry out the contrastive analysis. The comparative results are presented in Table 6.

Table 6 Comparative results with different identification indices

Table 6 shows that two types of sandstone water were correctly classified by the method using different identification indices, and six and three limestone water samples were misidentified, respectively. According to the analysis results, the return discriminant ratios of G1 and G2 are 88% and 94%, respectively. Further, an analysis using parameters different from G1 and G2 was conducted, and the result shows that the classification results are the same as those for G1 when using Na + K, Ca, Mg, Cl, SO4, HCO3, pH, and TDS as the identification indices. The analysis results provide further proof that sandstone water can be identified clearly. The results also show that some significant trace ions can affect the classification to a certain extent.

Identification with different units

Milligrams per liter and milliequivalents per liter are different forms of ion concentration: the former indicates the absolute concentration of ions, and the latter is a relative concentration. Hence, the latter can better reflect the other trace ions. Two units can be used to analyze water characteristics, but no comparative analysis has been studied. One objective of this study is to check whether the units lead to different or similar research results.

According to the analysis results, using parameters in G1 without TDS, the return discriminant ratios of milligrams per liter and milliequivalents per liter, which are two different forms of concentration, are 94% and 84%, respectively. Table 7 lists that two types of sandstone water were all correctly classified, whereas three and eight limestone water samples were misidentified. Further, analysis with the unit milliequivalents per liter but without the parameters of Fe and NH4 was performed. The results obtained were identical to those of G4. In this case, the use of milliequivalents per liter appears inappropriate.

Table 7 Comparative results with different units

Identification with different combined methods

PCA can be combined with other mathematical methods for data preprocessing and because of its function of parameter reduction. In this study, PCA was researched to know whether it is suitable for hydrochemical analysis, and further analyzed when a combination of PCA with other methods is suitable. With regard to the return discriminant ratios of G1, G2, G3, and G4 and the correlation of the identification indices, the data in G1 were used to present this study item. Ten identification indices (Na + K, Ca, Mg, Cl, SO4, HCO3, Fe, NH4, TDS, and pH) were labeled as × 1, × 2, × 3, × 4, × 5, × 6, × 7, × 8, × 9, and × 10, respectively. Table 8 presents the comparative results of the different methods.

Table 8 Comparative results of the different methods

Table 9 presents the results of the Kaiser–Meyer–Olkin (KMO) test and Bartlett’s test of sphericity, which indicates whether the sample size used for the factor analysis was adequate. The value of KMO is 0.5. Because the significance value (p value) of Bartlett’s test is 0, which is < 0.01, the value is significant, and the correlation matrix is not an identity matrix. Thus, it may be concluded that the factor model is appropriate. According to the operating steps mentioned in the section “Principal component analysis,” three principal components are extracted, and the principal component expressions are derived as shown in Eqs. (1)–(3). The analysis results are presented in Table 8. The table shows that the return discriminant ratio of the combined PCA method is 88%, and that of the method without PCA is 94%.

$$ {X}_1=-0.18{x}_1+0.31{x}_2+0.45{x}_3+0.32{x}_4+0.41{x}_5-0.38{x}_6+0.21{x}_7-0.27{x}_8+0.21{x}_9-0.29{x}_{10} $$
(1)
$$ {X}_2=0.60{x}_1+0.20{x}_2+0.09{x}_3+0.20{x}_5+0.25{x}_6+0.01{x}_7+0.37{x}_8+0.60{x}_9+0.04{x}_{10} $$
(2)
$$ {X}_3=0.17{x}_1-0.19{x}_2+0.08{x}_3+0.43{x}_4-0.21{x}_5+0.37{x}_6+0.68{x}_7-0.25{x}_8-0.08{x}_9+0.17{x}_{10} $$
(3)
Table 9 KMO and Bartlett’s test

Limitation of the work and future study

In this paper, the identification results of water sources under different conditions are given to make a comparison. The major purpose is to study the identification accuracy of similar limestone water or similar sandstone water. Descriptive statistics, contrastive analysis, and statistical methods are used in this study to determine the influence of different methods, parameters, and units on identification results. And, the research shows that these differences can really cause changes in identification results. PCA is the most common method used to eliminate the linear correlation between the parameters, but it was not always appropriate to analyze the data. Besides, the choice of original data and its representation are also important to the results. Different methods and discrimination index are used to do the study but seldom probe into whether it is appropriate. And, this study may bring a discussion on it.

In this study, three conditions are considered to discuss the discrimination model, but there must be some other reasons that caused the different results. Although lots of methods have been used quite mature, its application condition should make deep research all the time. Water problems account for most of the mine hazards, and its research has always been the focus. Basically, the hydrochemical analysis is necessary for dealing with such issues. Therefore, how to make the analyzing results more accurate should be considered in the future study.

Conclusions

To study the application of hydrochemical analysis in water source identification, especially for similar water, samples of limestone water from the Ordovician limestone and Xujiazhuang limestone aquifers and samples of sandstone water from the Lower Jurassic sandstone and the roof sand water of the lower seam no. 3 were obtained and analyzed. Graphical representations are used to show the hydrochemical characteristics of the four types of water. HCA was used to group several clusters and test the water samples. PCA was used to reduce the number of water chemical parameters and compare the different parameters.

According to the results of hydrology analysis, limestone water can be easily distinguished from sandstone water based on the Piper and Stiff diagrams, and the two types of sandstone water (III and IV) differ from each other more clearly than the two types of limestone water (I and II). Further, the hydrochemical types of the four types of water are SO4-Na·Ca, SO4-Na, SO4·HCO3-Na, and HCO3-Na, respectively.

HCA is an effective method for sample grouping and was used to test the water samples. The cluster results show that most of the water samples are correctly grouped. Several cases were excluded in the analysis to reduce errors.

The three groups of the contrastive analysis show that trace ions can affect the discriminant results. Although the unit milliequivalents per liter indicate relative concentration, this form does not have useful effects on the discriminant results. Data used in this paper are appropriate for factor analysis, which is based on KMO and the significance value of Bartlett’s test, but it has an inverse result compared to the original data.