Introduction

Mine water inrush is one of the extraordinarily severe coal mine accidents in China, and the main reason for these kind of accidents is the absence of in-depth research on the hydrogeological conditions of collieries. According to some statistics from China’s State Administration of Coal Mine Safety, there are about 905 mines with complex hydrogeological conditions in China (Zhang et al. 2020; Wang et al. 2020; Sun et al. 2015; Xu et al. 2020). The variation characteristics of the hydrochemical parameters in the underground multi-aquifer system are a direct reflection of mine water inrush (Yin et al. 2019; Qian et al. 2017). Therefore, accurate and rapid identification of water inrush source in the multi-aquifer system is very important in protecting the lives of miners and maintaining safe production of collieries.

Mining would cause changes in the level of groundwater, temperature and hydrochemical components, which can be analyzed using various techniques, including water temperature and water level method, hydrochemical analysis and mathematical analysis. Based upon the geothermal gradient theory, aquifer temperature shows differences at certain depths. Sui et al. (2010) compared the water temperature at the water inrush point with that of the aquifer with hidden water inrush potential, which can preliminarily predict the source of mine water inrush. Lin et al. (2014) conducted a dewatering test on the aquifer, and discovered the recharge channel through the change of water level, and identified the potential source of mine water inrush. Wang and Shi (2019) used Piper, Durov, and Stiff diagrams to identify the four types of water sources. Li et al. (2016) obtained the conventional ion concentration through field sampling tests, and determined the seawater infiltration channel and water inrush source by combining these test results with multivariate statistical analysis to study the hydrochemical effect of the aquifer. Stable isotopes δD and δ18O play an important role in analyzing the origin and formation of groundwater in aquifers. They can not only determine the relationship among groundwater, precipitation and surface water, but also analyze the supply source and mixing ratio of groundwater (Chafouq et al. 2018; Yi et al. 2018; Boumaiza et al. 2020; Cao et al. 2020; Liao et al. 2020). Guan et al. (2019) used stable isotopes to verify each other with hydrogeochemical analysis to accurately identify the source of water inrush in Mingdon mine (China), and reported that, due to the high cost, most coal enterprises did not use the method often. In recent years, the mathematical statistical analysis has also developed rapidly, which is widely used in hazard identification related to water inrush activities (Liu et al. 2019; Wang et al. 2020). Based upon cluster analysis, Zhang et al. (2019a, b) established a multiple logistic regression recognition model, and identified and verified the water inrush aquifer in Qinan coal mine, China. Huang et al. (2019) established the Piper-PCA-Fisher water source recognition model, which is more accurate than the Piper diagram method or Fisher discriminant method. Based upon the principal component analysis (PCA) and BP neural network, Yang et al. (2019) proposed the water source discrimination model of mine water field monitoring system, which was applied to Lijiazui mine in Huainan (China), and exhibited the accuracy of 91%. Dong et al. (2019) combined the Fisher feature extraction and support vector machine (SVM) methods, and applied this new model to the Wuhai mining area (China). The results showed that this new combined model was more accurate and efficient in discriminating water inrush sources than the traditional SVM model. However, this method requires a large number of water sample data, and could not identify multiple water inrush sources at the same time.

The mathematical analysis method has the characteristics of simple operation, high discrimination efficiency, is objective and provides accurate results at low cost. Therefore, based upon the mathematical theory of grey situation decision-making, this paper proposes a new model for water inrush water source discrimination based on principal component analysis and entropy weight-grey situation decision method. In the model, the principal component analysis eliminates redundant variables, reduces the workload, and assigns entropy weight to each variable to characterize the degree of difference of variables. Finally, the gray situation method is used to attribute the multi-factor target evaluation to single-objective decision-making, which solves the problem that single-factor evaluation cannot reflect the water quality characteristics of water inrush sources. Since the production of Xieqiao mine in Huainan coalfield (China), there have been 24 water inrush accidents due to the influence of coal mining, which have a great impact on the safety of miners and the mine economy. In the water inrush potential areas of water trickling in the coal wall, water chemical variable information samples were extracted, and put into the model for discrimination. This way, the predictions regarding the source of water pouring samples were made. Timely investigation and treatment for the water filling channel and water filling strength of the aquifer can effectively prevent the occurrence of water inrush accidents. This study is only aimed at quantitative analysis of hydrochemical information, and therefore, it can also be applied to other similar mine water source discrimination, thus indicating a wide range of applicability.

Hydrogeological conditions in the study area

Xieqiao coal mine (China) is located in the northeast of Yingshang county, Fuyang city, Anhui province, China and is geographically positioned as shown in Fig. 1. The terrain in the mining area is flat and belongs to the Huaihe alluvial plain. The study area belongs to a transitional climate, with obvious seasonality of hot summers and cold winters. The annual average temperature is 15.1 °C, while the annual average rainfall is 926.3 mm. Most of the rainfall falls in June, July and August, accounting for about 40% of that falls the whole year. The annual average evaporation is 1610.14 mm. The evaporation is larger than the rainfall, whereas the humidity coefficient is nearly 0.5.

Fig. 1
figure 1

Location of the study area, sampling sites and the aquifer profile

The groundwater regime in the mining areas of Xieqiao consists of four subsystems, namely the loose aquifer of the Cenozoic, the coal-bearing sandstone fissure aquifer of the Permian, the limestone-karst fissure aquifer of the Carboniferous and the limestone-karst fracture aquifer of the Ordovician. The hydrogeological characteristics of the aquifer and aquifuge are shown in Fig. 1. Between the coal-bearing sandstone fissure aquifer and the loose layer pore aquifer, there is a thick clay layer covering the coal measures. Besides, the average distance between the limestone aquifers of the Taiyuan formation and the coal floor is 16.44 m. Therefore, under normal conditions, there is no direct water filling effect among the three aquifers.

Sampling and testing

The sample bottle was washed 2–3 times using water before sampling. The sample bottle should not be filled with water sample, and around 5–10 ml space was left at the top of the bottle (Huang et al. 2019; Guo et al. 2019). The water samples were maintained at a low temperature to prevent any chemical reactions (Zhang et al. 2019a, b; Zhang et al. 2016). Eight variables (K+ + Na+, Ca2+, Mg2+, Cl, SO42−, HCO3, pH and TDS) were tested. The pH value of the sample was tested using a Hanna portable pH meter within the 5 min of the collection of samples. The samples for cation analysis were acidified with nitric acid to pH ≤ 2. The tests were conducted within 24 h after sampling at the Quality Inspection Center, Anhui University of Science and Technology, China. The Cl, SO42−, and HCO3 tests were conducted using ion chromatography, whereas K+ + Na+, Ca2+, and Mg2+ tests were conducted using inductively-coupled plasma mass spectrometry. In order to review the reliability of test results, the anion and cation balance was calculated to confirm that of the standard error lied within ± 5%. As shown by the results presented in Table 1 and Fig. 1, a total of 37 training water samples were collected between 2005 and 2018 in Xieqiao mine, China. The samples included five samples from the Cenozoic aquifer, 14 samples from the Permian aquifer, and 18 samples from the carboniferous aquifers. In order to show the accuracy of the model, 14 verifying samples were also collected. The Q, P, C, X1, X2, X3, X4, X5, and X6 were used to represent the Cenozoic aquifer, the Permian aquifer, the Carboniferous aquifer, Na+, K+, Ca2+, Mg2+, Cl, SO42− and HCO3, respectively.

Table1 Water samples from the Xieqiao coal mines, China

Methods

Principal component analysis

The p vectors X1, X2,…, Xp of the original data matrix X was used as a linear combination Y = AX. The relationship between the original and new variables is given by Eq. (1).

$$\left\{\begin{array}{c}{Y}_{1}={a}_{11}{X}_{1}+{a}_{12}{X}_{2}+\cdots +{a}_{1p}{X}_{p}\\ {Y}_{2}={a}_{21}{X}_{1}+{a}_{22}{X}_{2}+\cdots +{a}_{2p}{X}_{p}\\ \cdots \\ {Y}_{p}={a}_{p1}{X}_{1}+{a}_{p2}{X}_{2}+\cdots +{a}_{pp}{X}_{p}\end{array}\right.,$$
(1)

where \({a}_{i1}+{a}_{i2}+{a}_{i3}+\cdots +{a}_{ip}=1\); Yi and Yj are not related, Yi is the maximum variance of all the linear combinations of (X1, X2, …, Xp), and Y2 is the combination with the largest variance among all the linear combinations of X1,X2, …, Xp that are not related to Y1. Moreover, the sum of the variances of Y1, Y2, …, Yp is equal to the sum of the variances of X1, X2, …, Xp.

The general steps for solving the principal components are as follows.

The original variable data was standardized, and the covariance matrix Σ among the variables was calculated. The eigenvectors of the covariance matrix were λ1 ≥ λ2 ≥  ≥ λp, whereas the corresponding unit eigenvectors were T1, T2, …, Tp. The transformation matrix was given by: A = T′, where j is the i-th row of A and the unit feature vector Ti corresponded to the i-th largest root of Σ. In addition, the variance of the i-th principal component Yi was equal to the i-th large characteristic root λi of Σ. Then, the variance contribution rate of Yk was calculated for the k-th principal component and given by: \({\eta }_{k}=\frac{{\lambda }_{k}}{{\sum }_{k=1}^{p}{\lambda }_{k}}\). If m (m < p) principal components were selected, the cumulative contribution rate of principal components Y1, Y2, …, Ym was \({\upxi }_{m}={\sum }_{k=1}^{m}{\lambda }_{k}/{\sum }_{k=1}^{p}{\lambda }_{k}\). The main component index that made the cumulative contribution rate of variance reach 75% or more was selected (Price et al. 2006; He et al. 2016; Cloutier et al. 2008).

Entropy weight

According to the information theory, information is a measure of the degree of order of the system, whereas entropy is the measure of the degree of disorder of a system. Entropy value can represent the difference in the concentration of each ion in different water samples, whereas the weight value of the corresponding ion can be obtained using the entropy weight method (Chen et al. 2019; Fausto et al. 2019; Liang et al. 2019). The calculation steps are as follows.

There were m evaluation water samples and n evaluation ions, They formed the initial matrix \(R={({r}_{ij})}_{m\times n}\), in which \({r}_{ij}\) represents the evaluation value of the j-th ion in the i-th water sample (i = 1,2,3,…,m;j = 1,2,3,…,n).

The proportion of pij was calculated using Eq. (2).

$${p}_{ij}=\frac{{r}_{ij}}{\sum_{i=1}^{m}{r}_{ij}}.$$
(2)

The entropy of the j-th ion was calculated using Eq. (3).

$${e}_{j}=-\mathrm{k}\sum_{i=1}^{m}\bullet \mathrm{ln}{p}_{ij}, k=\frac{1}{\mathrm{ln}m}.$$
(3)

The entropy weight of the j-th ion was calculated using Eq. (4).

$${\omega }_{j}=\frac{\left(1-{e}_{j}\right)}{\sum_{j=1}^{n}\left(1-{e}_{j}\right)}.$$
(4)

Entropy weight and grey situation decision method

The binary combination of events and countermeasures constitutes the situation. Taking an event as the core, other similar events were gathered around the core event, forming a gray event to study the countermeasures. This is the gray situation decision-making thought (Zu et al. 2018; Zhang et al. 2014). In the identification of mine water inrush, the identification index was regarded as the gray element, while the identification object was taken as an event. Different water source categories were used as the countermeasures (Li et al. 2019; Fu 2016; He and Gong 2013). The optimal situation was determined through decision analysis. The water source category corresponding to the optimal situation was the evaluation result. In general, there was no preference between different countermeasures, though different goals have different effects on optimization, and different decision makers' preferences for different goals will also lead to inconsistent goal weights. In traditional gray situation decision-making, the equal treatment of targets could not reflect the decision makers’ preferences and the actual situation of the decision-making problems. Therefore, the entropy theory was applied to different indices to give weights. It improved the resolution between situations and made the decision results more accurate and reasonable. The main steps of the mathematical model are as follows.

Event ai (i = 1, 2, …, n) and countermeasure bj (j = 1, 2, …, m) were determined. Situation was constructed \(s=(a,b)\) and the situation array was established. The p (p = 1, 2, …, q) index was given. Different situation effect measurement matrices were constructed according to different targets p, and the elements in the matrix were obtained relying on the membership function of each index. The situation effect measurement matrices were defined as given by Eq. (5).

$${D}^{(p)}={({r}_{ij}^{p}/{S}_{ij})}_{m\times n}.$$
(5)

According to the single-indicator decision effect measurement value \({r}_{ij}^{(p)}\) and the entropy weight of each indicator, the comprehensive effect measurement value \({r}_{ij}^{(\sum )}\) of multiple indicators was obtained. The \({r}_{ij}^{(\sum )}\) value was defined and given by Eq. (6).

$${r}_{ij}^{(\sum )}=\sum_{p}^{q}{\omega }_{p}{r}_{ij}^{(p)}.$$
(6)

Therefore, the comprehensive decision matrix was defined as given by Eq. (7).

$${D}^{(\sum )}=\left[\begin{array}{cccc}{r}_{11}^{(\sum )}& {r}_{12}^{(\sum )}& \cdots & {r}_{1m}^{(\sum )}\\ {r}_{21}^{(\sum )}& {r}_{22}^{(\sum )}& \cdots & {r}_{2m}^{(\sum )}\\ \vdots & \vdots & \vdots & \vdots \\ {r}_{n1}^{(\sum )}& {r}_{n2}^{(\sum )}& \cdots & {r}_{nm}^{(\sum )}\end{array}\right].$$
(7)

The best situation was chosen and the decisions were made based upon the best results. If \({b}_{j*}\) was the best strategy of the comprehensive decision matrix column, the comprehensive effect measurement value \({r}_{ij}^{(\sum )}\) was defined as given by Eq. (8).

$${r}_{ij}^{\underset{*}{(\Sigma )}}=\underset{j}{\mathrm{max}}\left\{{r}_{ij}^{(\Sigma )}\right\}.$$
(8)

If \({a}_{j*}\) was the best strategy of the comprehensive decision matrix row, the comprehensive effect measurement value \({r}_{ij}^{(\sum )}\) was defined as given by Eq. (9).

$${r}_{ij}^{\underset{*}{(\Sigma )}}=\underset{i}{\mathrm{max}}\left\{{r}_{ij}^{(\Sigma )}\right\}.$$
(9)

Bayesian discriminant method

If n samples were taken from G matrices, each sample must belong to one of the G matrices (Ag). If p variables (x1, x2,…, xp) were measured for each sample, then each sample can be regarded as a point in the p-dimensional space {R}. Additionally, n samples constituted a p-dimensional sample space {R}. An unknown sample X (x1, x2,…, xp) was also regarded as a point in the p-dimensional space. If it fell in the subspace with the highest probability, it could be classified as one of the G matrixes (Yan et al. 2019; Fang et al. 2020; Du et al. 2020). The Bayesian discriminant model is as follows.

There are g matrices Ag (g = 1,2,…, G). The probability density function is given by Eq. (10).

$${f}_{g}\left(x\right)={\left(2\pi \right)}^{-\frac{p}{2}}{\left|\sum -1\right|}^\frac{1}{2}\mathrm{exp}\left[-\frac{1}{2}{\left(x-{a}_{g}\right)}^{^{\prime}}\sum -1\left(x-{a}_{g}\right)\right],$$
(10)

where \(x = \left( {x_{1} ,x_{2} , \cdots ,x_{p} } \right)^{\prime}\). The parameters \(a_{g}\) and \(\sum\) represent the mean and covariance matrix of Ag, respectively. The prior probability qg and parameter of Ag were known, and there was no significant difference between the matrix covariance matrix. The discriminant function is given by Eq. (11).

$${q}_{g}{f}_{g}\left(x\right)={q}_{g}{\left(2\pi \right)}^{-\frac{p}{2}}{\left|\sum -1\right|}^\frac{1}{2}\mathrm{exp}\left[-\frac{1}{2}{\left(x-{a}_{g}\right)}^{^{\prime}}\sum -1\left(x-{a}_{g}\right)\right],$$
(11)

where g = 1,2,…,G. The multivariate linear discriminant function under Bayes criterion can be obtained and is given by Eq. (12).

$${y}_{g}(x)={c}_{0g}+{c}_{1g}{x}_{1}+{c}_{2g}{x}_{2}+\cdots +{c}_{pg}{x}_{p}$$
(12)

Mine water inrush identification and verification

Extraction of discriminant variables and entropy weights

As shown in Fig. 2, the content of alkaline-earth metal ions (Ca2+ and Mg2+) were significantly higher than the alkali metal ions (Na+  + K+). The main chemical type of Cenozoic, Permian and Carboniferous aquifers were Ca·Na-Cl·HCO3, Ca·Mg·Na-Cl·HCO3 and Ca·Na-Cl·SO4·HCO3, respectively.

Fig. 2
figure 2

Piper diagram of all water samples from different aquifers

The correlation coefficient thermograph directly described the degree of correlation among the variables, as shown in Figs. 3, 4, and 5. For the Cenozoic aquifer, the correlation between the variables was large and all the variables were positively correlated. The concentrations of Cl and Na+  + K+, Ca2+ and Mg2+ were significantly correlated, and the corresponding correlation coefficients were 0.99 and 0.88, respectively. For the Permian aquifer, the concentrations of Ca2+ and Mg2+ were positively correlated, with the correlation coefficient of 0.96. The concentrations of Ca2+, Mg2+ and SO42− were positively correlated, with the correlation coefficients of 0.75 and 0.81, respectively. The concentrations of Na+  + K+ and HCO3 were also positively correlated, with the correlation coefficient of 0.79. For the Carboniferous aquifer, the concentrations of Cl and Na+  + K+ were significantly correlated, with the correlation coefficient of 0.98. The correlation coefficient between these variables was large, which will cause information overlap and affect the accuracy of the water inrush discrimination model. In order to solve these problems, the principal component analysis was used to extract the main variables as the discriminant factors of the water inrush discriminant model.

Fig. 3
figure 3

Heat map of the correlation coefficient for the Cenozoic aquifer

Fig. 4
figure 4

Heat map of the correlation coefficient for the Permian aquifer

Fig. 5
figure 5

Heat map of the correlation coefficient for the Carboniferous aquifer

In general, the number of principal components depended on its cumulative variance ratio. When the cumulative variance ratio was greater than 80%, the number of principal components at this time can fully reflect the water chemical information of the sample. According to Kaiser criterion and the scree plot method (Fig. 6), the cumulative variance rate was found to be 85.91%. Therefore, the number of principal components was 3. As shown by the results presented in Table 2, the Principal component 1 reflected the information of 36.23% of the training samples and represented Na+ and K+. Principal component 2 reflected the information of 29.15% of the training samples and represented SO42−. Principal component 3 reflected the information of 20.53% of the training samples and represented HCO3. According to the principal component score coefficients (Table 3), the relationships between the principal components P1, P2 and P3 and the original variables X1, X2, X3, X4, X5 and X6 were obtained, which are given by Eqs. (13)–(15).

Fig. 6
figure 6

Scree plot of the principal components

Table 2 Variance interpretation rate and orthogonal rotation factor loading matrix
Table 3 The principal component score coefficients
$${P}_{1}=0.567{X}_{1}+\text{0.312}{X}_{2}+\text{0.268}{X}_{3}+\text{0.471}{X}_{4}+\text{0.416}{X}_{5}\text{+0.339}{X}_{6},$$
(13)
$${P}_{2}=-\text{0.408}{X}_{1}+\text{0.418}{X}_{2}+\text{0.490}{X}_{3}-\text{0.283}{X}_{4}+\text{0.495}{X}_{5}-\text{0.306}{X}_{6},$$
(14)
$${P}_{3}=-\text{0.018}{X}_{1}-\text{0.462}{X}_{2}+\text{0.415}{X}_{3}-\text{0.451}{X}_{4}+\text{0.099}{X}_{5}+\text{0.633}{X}_{6}.$$
(15)

Construction of water inrush source recognition model

Substituting the ion concentration data presented in Table 1 into Eqs. (13) (14) and (15), the scores of the three principal components were obtained, and the corresponding results are presented in Table 4. According to the characteristics of ionic components of the water samples presented in Table 1, the ionic concentration values of the same aquifer were more discrete, and some data have abnormal values, as shown in Fig. 7. Therefore, Huber’s M-estimated value should be used instead of the average to reflect the concentration trend to obtain the ionic index classification values corresponding to the three water sources. Taking Huber’s M estimator of P1, P2 and P3 as the optimal value, the ranking values of the game set B = {b1, b2, b3} are presented in Table 5. According to the entropy weight theory, the weights of P1, P2 and P3 were also calculated and are presented in Table 6.

Table4 The principal component score
Fig. 7
figure 7

Scatter plots of anion and cation water samples

Table 5 Rating standard table of countermeasures set
Table 6 Entropy weight table of principal components

According to the grading criteria of the game set presented in Table 5, the membership function was calculated using the linear half-order function method, and the linear half-order function was the half-step function. Finally, the membership function graphs of P1, P2, and P3 were obtained, as shown in Fig. 8.

Fig. 8
figure 8

Membership function of principal components

According to Fig. 8, the membership function of each variable is as follows.

  1. (1)

    Membership of variables P1.

    The membership function of P1 belonging to the Cenozoic aquifer was defined as given by Eq. (16).

    $${f}_{{P}_{1}}^{Q}=\left\{\begin{array}{cc}1& x<693.3621\\ \frac{933.0847-x}{239.7226}& 693.3621\le x\le \text{933.0847}\\ 0& x>\text{933.0847}\end{array}\right.$$
    (16)

    The membership function of P1 belonging to the Permian aquifer was defined as given by Eq. (17).

    $${f}_{{P}_{1}}^{P}=\left\{\begin{array}{cc}0& x<933.0847\\ \frac{x-933.0847}{255.5420}& 933.0847\le x\le \text{1158.6267}\\ 1& x>\text{1158.6267}\end{array}\right.$$
    (17)

    The membership function of P1 belonging to the Carboniferous aquifer was defined as given by Eq. (18).

    $${f}_{{P}_{1}}^{C}=\left\{\begin{array}{cc}0& x<693.3621\\ \frac{x-693.3621}{239.7226}& 693.3621\le x\le 933.0847\\ \frac{1158.6267-x}{225.5420}& 933.0847\le x\le \text{1158.6267}\\ 0& x>\text{1158.6267}\end{array}\right.$$
    (18)
  2. (2)

    Membership of variables P2.

    The membership function of P2 belonging to the Cenozoic aquifer was defined as given by Eq. (19).

    $${f}_{{P}_{2}}^{Q}=\left\{\begin{array}{cc}0& x<-337.9810\\ \frac{x+337.9810}{70.1005}& -337.9810\le x\le -\text{267.8805}\\ 1& x>-\text{267.8805}\end{array}\right.$$
    (19)

    The membership function of P2 belonging to the Permian aquifer was defined as given by Eq. (20).

    $${f}_{{P}_{2}}^{P}=\left\{\begin{array}{cc}1& x<-619.1755\\ \frac{x+337.9180}{-281.1945}& -619.1755\le x\le -\text{337.9810}\\ 0& x>-\text{337.9810}\end{array}\right.$$
    (20)

    The membership function of P2 belonging to the Carboniferous aquifer was defined as given by Eq. (21).

    $${f}_{{P}_{2}}^{C}=\left\{\begin{array}{ll}{0}& x<-619.1755\\ \frac{{{x}}+\text{619.1755}}{281.1945}& -\text{619.1755}\leq {{x}}\leq -\text{337.9810}\\ \frac{{X+\text{267.8805}}}{-\text{70.1005}}& -\text{337.9810}\leq{{x}}\leq -\text{267.8805}\\ {0}& x>-\text{337.9810}\end{array}\right.$$
    (21)
  3. (3)

    Membership of variables P3.

The membership function of P3 belonging to the Cenozoic aquifer was defined as given by Eq. (22).

$${f}_{{P}_{3}}^{Q}=\left\{\begin{array}{cc}0& x<-137.8058\\ \frac{x+137.8058}{35.8629}& -137.8058\le x\le -101.9429\\ \frac{221.8392-x}{323.7821}& -101.9429<x\le 221.8392\\ 0& x>221.8392\end{array}\right.$$
(22)

The membership function of P3 belonging to the Permian aquifer was defined as given by Eq. (23).

$${f}_{{P}_{3}}^{P}=\left\{\begin{array}{cc}0& x<-101.9429\\ \frac{x+101.9429}{323.7821}& -101.9429\le x\le 221.8392\\ 1& x>221.8392\end{array}\right.$$
(23)

The membership function of P3 belonging to the Carboniferous aquifer was defined as given by Eq. (24).

$${f}_{{P}_{3}}^{C}=\left\{\begin{array}{cc}1& x<-137.8058\\ \frac{x+101.9429}{-35.8629}& -137.8058\le x\le -101.9429\\ 0& -\text{101.9429}\end{array}\right.$$
(24)

Verification of water inrush source recognition model

The values of the principal components P1, P2 and P3 of the 14 validation samples presented in Table 1 were substituted into the corresponding membership functions (16)–(24) to obtain the degrees of membership of P1, P2 and P3. According to the membership function and the maximum membership principle in fuzzy mathematics, the results are presented in Table 7. The recognition accuracy of water samples from the Cenozoic aquifer was 100%. The recognition accuracy of water samples from the Permian aquifer was 87.50%, whereas the recognition accuracy of water samples from the Carboniferous aquifer was 66.67%. The comprehensive accuracy of the model was 85.71%.

Table 7 Results of discriminating water samples

Discussion

The grey situation decision-making method is used to obtain the optimal situation in an environment with known and unknown factors (Zu et al. 2018; Zhang et al. 2014). It was applied to the identification of water inrush resources with many multi-factor variables, which are attributed to a single target for decision discrimination. It solves the problem that a single factor cannot identify the source category. However, traditional multi-objective grey situation decision-making methods do not reflect the decision makers’ preferences and the actual situation of decision-making when dealing with decision-making goals (Li et al. 2019; Fu 2016; He and Gong 2013). By assigning entropy weight to each variable, which can reflect the degree of difference between the variables, the grey situation decision-making method is further complemented and improved. In order to evaluate the accuracy of this method, it has been compared with the Bayesian discriminant method.

On the basis of extracting the principal components, the classification function coefficients of P1, P2 and P3 are obtained using the Bayesian discriminant method, as shown by the results presented in Table 8. The linear discriminant functions of three kinds of aquifers are derived and given by Eqs. (25)–(27).

Table 8 Discriminant coefficient of classification function
$${y}_{Q}=0.006{x}_{{p}_{1}}+0.005{x}_{{p}_{2}}-0.001{x}_{{p}_{3}}-2.452,$$
(25)
$${y}_{P}=0.006{x}_{{p}_{1}}+0.002{x}_{{p}_{2}}+0.003{x}_{{p}_{3}}-4.146,$$
(26)
$${y}_{C}=0.009{x}_{{p}_{1}}+0.008{x}_{{p}_{2}}-0.001{x}_{{p}_{3}}-3.613.$$
(27)

Substituting the data presented in Table 4 into the function Eqs. (16), (17) and (18), the discrimination results were obtained and are presented in Table 9. The recognition accuracy of water samples from the Cenozoic aquifer was 0%. The recognition accuracy of water samples from the Permian aquifer was 75%. The recognition accuracy of water samples from Carboniferous aquifer was 100%, whereas the comprehensive accuracy of the model was 64.29%. Therefore, the new discriminant model proposed in this paper has higher accuracy than the traditional Bayesian discriminant method.

Table 9 Bayesian discriminant results

Due to the small number of Cenozoic samples, the accuracy of the Cenozoic samples discrimination was low using the Grey Situation Decision method and the Bayesian discriminant method. Therefore, the model was based on a certain number of water samples. More water samples should be collected to improve the accuracy of the model. In addition, this model should consider the impact of temperature, hydrogeological conditions and human activities on the aquifer to further improve its applicability.

Conclusion

Principal component analysis was used to eliminate the internal correlation between ionic variables. Six ions were combined into three principal components P1, P2 and P3, which comprehensively reflected the information of water chemistry. It greatly reduces the number of variables and the calculation of the model.

The recognition accuracy of water samples from the fourth aquifer was 100%. The recognition accuracy of water samples from the coal-bearing sandstone aquifer was 87.50%. The recognition accuracy of water samples from the limestone-karst fissure aquifer in the Carboniferous was 66.67%. The comprehensive accuracy of the model was 85.71%. The entropy weight-grey situation decision model provides a new method for the identification of water inrush, and has important theoretical guiding significance for mine water prevention work.