Abstract
Mine water inrush is one of the major geological hazards that threaten safe production in coal mines. The accurate identification of mine water inrush sources plays a vital role in mine water disaster control, and it is the key to preventing mine water inrush incidents. Ninety-three water samples were extracted from the three types of aquifers in the Qinan coal mine. The cluster analysis method was then used to analyze 82 of the original water samples, and the other 11 water samples that did not meet the requirements were removed. Then, the remaining 82 water samples were regarded as training samples, and the principal component analysis was completed. Taking the scores of the principal components as the independent variable and the types of water inrush sources as the dependent variable, the multiple logistic regression recognition model was established. Meanwhile, this recognition model was used to recognize the types of mine water inrush sources and verify the recognition accuracy for the 82 training samples. The comprehensive recognition accuracy reached 86.6%, which is much higher than the traditional recognition methods of water inrush sources. Based on cluster analysis, the multiple logistic regression recognition model fully considers the ion content measurement errors and the complex relationships between the internal ions, and this recognition model is more reasonable and improves the accuracy of water inrush source recognition. This paper provides a new method for recognizing the problem of water inrush sources, which also provides an effective basis for mine water inrush prevention and control.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Mine water hazards are one of the main geological hazards that can threaten the safety of coal mining. Coal mine water inrush often causes the partial submergence of a coal mine, causing huge economic losses and human casualties (Gui and Lin 2016; Hu et al. 2011; Wu et al. 2016). The Qinan coal mine is located in the Suxian mining area. Since the coal mine has been operational, there have been fewer occurrences of water bursting, and the water inflow has not been large. However, the Taoyuan coal mine, which belongs to the same hydrogeological unit as the Qinan coal mine, experienced a mine water inrush accident with a maximum water inflow of 29,000 m3/h in 2013, which caused a serious flooding incident. To prevent the occurrence of similar large-scale mine water inrush accidents in the Qinan coal mine, it is necessary to carry out mine water prevention and control work. Among these tasks, the accurate judgment of mine water inrush sources is a prerequisite of coal mine water inrush prevention and control work, as well as an important part of preventing mine water inrush accidents (Ganyaglo et al. 2011; Zhang et al. 2017).
For a long time, experts and scholars had proposed many methods for judging water inrush sources in the problem of “Recognition of mine water inrush”. Water inrush source recognition methods include geological analysis, hydrodynamic analysis, hydrochemical characteristics analysis, water temperature, water level dynamic observation and geophysical prospecting (Biswas and Sharma 2017; Farnham et al. 2000; Panagopoulos et al. 2016; Keskin et al. 2015). Among them, the hydrochemical characteristics analysis method is a simple and effective way to identify a mine water inrush source. Li et al. (2017) used a hydrochemical approach to ascertain the mine water sources and to locate the potential seawater inrush seepage channels in the Xinli Mine. Wei et al. (2015) identifies a water source by analyzing hydrochemical characteristic ions at water inrush points. But, at present, the methods of multivariate statistical analysis have been relatively mature. And the following methods are applied mostly in multivariate statistical analysis: principal component analysis reduces an original set of variables into a smaller number of uncorrelated components without losing much information (Jolliffe 2002; Kim et al. 2005; Meglen 1992; Qian et al. 2016), cluster analysis can measure the similarities among samples (Bu et al. 2010; Reghunath et al. 2002), discriminant analysis (includes Distance discriminant, Fisher discriminant and Bayes discriminant) can establish an intuitive discriminant relation (Chen et al. 2009; Huang and Chen 2011; Lu et al. 2012; Huang and Wang 2018). Xu et al. (2012) selected six sets of ions (K+ + Na+, Ca2+, Mg2+, Cl−, SO42−, HCO3−) and their total dissolved solids (TDS) as discriminant factors for designing a GRA–SDA coupled model. Liu et al. (2013) proposed a Fisher recognition analysis for identifying a coal mining inrush water source under mining-induced disturbances. A comprehensive identification model combining hydrochemistry analysis, water source detection, and water channel exploration was proposed by Liu et al. (2018). Based on the constant ion content test results, including the pH values and total dissolved solid (TDS), Yin et al. (2006) used systemic clustering and stepwise distinguishing to analyze the sources of the inrush water in the Wanbei Mining area. The PCA–BP neural network model, based on laser-induced fluorescence technology, was also used to identify a water inrush source by Wang et al. (2017a, b).
However, the current recognition methods of mine water inrush sources did not fully consider the measurement errors of the ion content caused by external factors and have ignored the complex relationships between the ions. These recognition methods have certain deviations from the actual identification process for mine water inrush sources. Therefore, the objectives of this paper are to propose a new method for the accurate identification of mine water inrush source, and it is the multiple logistic regression recognition model based on cluster analysis, which fully considers the measurement errors of the ion content and the complex internal relationships of ions. This method uses cluster analysis to measure the similarities among original water samples, and its purpose is to screen the original water samples. Principal component analysis is used to extract the information of hydrochemical indexes, and multiple correlated indicator variables are converted into new independent sample indicators. The multiple logistic regression recognition model can predict and classify based on existing water samples. So it can effectively extract the variation information of the original water samples, eliminate the influence caused by the superposition of the information among variables and realize the recognition of mine water inrush sources. In addition, the recognition model was applied to the water samples to be discriminated to verify its accuracy. The results show that the multiple logistic regression recognition model based on cluster analysis has high accuracy. And it is easy to operate in the actual water source discrimination process, with straightforward discrimination results.
Hydrogeological conditions in the study area
The Qinan coal mine is located in the middle of Huaibei plain, and it is distributed in the Huaihe River valley, positions shown in Fig. 1. The Huihe River, a tributary of the Huaihe River, flows through the mining area, and It has high vegetation coverage. The study area belongs to the north temperature monsoon region ocean—continental climate and has distinctive four seasons. As a typical central plain climate, the annual average temperature and annual average precipitation are about 14.6 °C and 756 mm, respectively. Rainfall is concentrated in July and August. The evaporation capacity is higher than the precipitation, and the annual average relative humidity is 71%.
The Qinan coal mine is located in the southwest region of the Sunan syncline. The inclination of the strata in the northern coal mine is steep, generally ranging from 20° to 30°. However, the inclination of the strata in the middle and eastern regions of the Qinan coal mine is gentler, generally ranging from 7° to 15°. The coal-bearing strata within the study area are covered by a loose layer from the Cenozoic period.
The groundwater regime in the mining areas of Qinan consists of four subsystems: the loose aquifer of the Cenozoic, the coal-bearing sandstone fissure aquifer of the Permian, the limestone-karst fissure aquifer in the Taiyuan formation of the Carboniferous and the limestone karst fracture aquifer of the Ordovician. The hydrogeological characteristics of the aquifer and aquifuge are shown in Fig. 2.
Among them, the limestone karst fracture aquifer of the Ordovician is furthest away from the coal seam. Thus, under normal conditions, there is no direct water filling effect on the coal mine.
Materials and methods
Sampling and test
We collected a total of 93 original water samples from the Qinan coal mine during the period of 2000–2017 (positions shown in Fig. 1), and the 93 original water samples were evenly distributed between 2000 and 2017. These water samples were used to establish the recognition model. Among them, there were 9 water samples of the fourth aquifer in the loose layer of the Cenozoic (referred to as “the fourth aquifer”), 39 water samples of the coal-bearing sandstone fissure aquifer of the Permian (referred to as “the coal-bearing sandstone aquifer”) and 45 water samples of the limestone-karst fissure aquifer in the Taiyuan formation of the Carboniferous (referred to as “the limestone aquifer”). In addition, 16 water samples from the Qinan mining area were taken from the site for verification model. Among them, there were 2 water samples of the fourth aquifer, 4 water samples of the coal-bearing sandstone aquifer and 10 water samples of the limestone aquifer.
When the water samples were collected, plastic bottles and covers were rinsed three to five times using sampling water. Later, water samples were stored in a clean 550 ml plastic bottle. Before the test, the water samples were processed at low temperature to inhibit the redox reaction and biochemical action (Chen et al. 2013; Faghih Nasiri et al. 2018). The conventional water chemistry tests include the contents of K+ + Na+, Ca2+, Mg2+, Cl−, SO42−, HCO3− and CO32−. Among them, HCO3− and CO32− were tested by dilute sulfuric acid-methyl orange titrimetry, Cl− and SO42− were tested by ion chromatography, Ca2+ and Mg2+ were tested by EDTA titration method and K+ + Na+ was tested by flame atomic absorption spectrophotometry. It was known from hydrogeological data that the 109 water samples were taken from drain holes, hydrogeological observation wells, extracting coal faces and underground roadways. The water levels of the observation wells did not show any abnormal changes during the collection of water samples. As such, this study was only concerned with the 109 water samples from a static perspective. The water sample data are shown in Table 1. Among them, X1, X2, X3, X4, X5, X6 and X7 represent the contents of K+ + Na+, Ca2+, Mg2+, Cl−, SO42−, HCO3− and CO32−, respectively.
Cluster analysis
The principle of cluster analysis is that n different samples are regarded as n different classes, and the two classes with the closest properties (or the shortest distance) can be merged into the same class. Then, the next two classes with the closest properties (or the shortest distance), from the n − 1 classes, are combined. This process continues until all the samples have been merged into a single class. In the cluster analysis, we usually divide it into Q-type cluster analysis and R-type cluster analysis based on the differences of classification objects. And Q-type cluster analysis is the classification of samples, while R-type cluster analysis is the classification of variables. The basic algorithm steps of cluster analysis are shown below:
-
1.
At the beginning, each sample is a separate class, and the distance matrix between two pairs of n classes is calculated, denoted as:
$$D_{0} = \left[ {\begin{array}{*{20}l} 0 \hfill & {} \hfill & {} \hfill & {} \hfill & {} \hfill \\ {d_{21} } \hfill & 0 \hfill & {} \hfill & {} \hfill & {} \hfill \\ {d_{31} } \hfill & {d_{32} } \hfill & 0 \hfill & {} \hfill & {} \hfill \\ \vdots \hfill & \vdots \hfill & \vdots \hfill & {} \hfill & {} \hfill \\ {d_{n1} } \hfill & {d_{n2} {\kern 1pt} } \hfill & {d_{n3} } \hfill & \ldots \hfill & 0 \hfill \\ \end{array} } \right].$$ -
2.
Find the minimum distance value dij in the distance matrix, and denoted as \(d_{{i_{1} j_{1} }}\), and combine the i1 and j1 classes into the n − 1 class.
-
3.
Calculate the distance between class n − 1 and other classes;
-
4.
Merge rows i1, j1 in the initial distance matrix D0 into new row, and columns i1, j1 into new column, the number of classes is reduced by one. We can get the new distance matrix D1.
-
5.
Repeat steps (2) (3) and (4) until n samples are clustered into one class.
-
6.
The clustering process was made into a cluster analysis diagram. And the original samples were screened according to the cluster analysis diagram to eliminate the samples that did not meet the requirements.
Principal component analysis
Principal component analysis is a method for original data compression and characteristic information extraction. It can replace many correlated variables with several comprehensive variables. These comprehensive variables not only express a great amount of information of the original variables but can also remain mutually independent (Jolliffe 2002; Kim et al. 2005; Meglen 1992; Qian et al. 2016; Huang et al. 2019). The basic principle is:
If X1, X2, …, Xn are defined as the original variables and Y1, Y2, …, Ym (m ≤ n) are new variables, the relationship between the original and new variables is
where Yi is independent from Yj (i ≠ j; i, j = 1, 2, …, m). Y1 is the item with the highest variance in all linear combinations of X1, X2, …, Xn, and Y2 is the item with the maximum variance in all linear combinations of X1, X2, …, Xn independent from Y1. The rest can be performed in the same way. The new variables Y1, Y2, …, Ym are the first, second, …, and the m principal component of the original variables X1, X2, …, Xn.
Multiple logistic regression analysis
Among n multiple logistic regression analysis models, assuming Pi (i = 1, 2, …, n) is the probability of the sample belonging to the type i (i = 1, 2, …, n). Taking the reference type that the sample belongs to type n, the multiple logistic regression analysis models are as follows (Wang 2010; Wang and Guo 2001; Zhang 2002):
Because the sum of the probabilities that the samples belong to n types is 1, so we could get Formula (3):
Simultaneous Formulas (2) and (3), we were then able to derive the following Formula (4):
where, P1, P1, …, Pn are the probability functions of the respective recognition models of types 1, 2, …, n; Xi represent the value of independent variables; Ai and Bi represent the coefficient of constant ion contents, respectively.
Establishment and verification of recognition model
The sequence of steps taken using the water inrush source recognition methodology is described as follows (Fig. 3).
Q-type cluster analysis of the original water samples
To reduce the deviation of ion content caused by the external factors, for example, polluted water samples, large water evaporation because of poor sealing of the container, and the measurement deviation caused by human error in the testing process, the ion content of the original water samples were used as the analysis variables, we used MATLAB to complete the Q-type cluster analysis of the 93 original water samples. The results of the cluster analysis are shown in Fig. 4.
From the results of Q-type cluster analysis of the original water samples displayed in Fig. 4 and according to the distance between the original water samples (Güler et al. 2002). We can re-classify the original water samples and get new classification results. Among them, there are differences between the new classification results and the original classification results for 93 original water samples, and those water samples are 1, 4, 13, 31, 38, 39, 42, 45, 72, 82 and 84. In the process of discrimination, these water samples will have an impact on the results of discrimination, so we eliminate these water samples which are not consistent with the original classification results, and improve the accuracy of discrimination.
R-type cluster analysis of training samples
This paper used MATLAB to complete the R-type cluster analysis of the training samples. We regarded the content of K+ + Na+, Ca2+, Mg2+, Cl−, SO42−, HCO3− and CO32− as the clustering bases, and the R-type cluster analysis results for the three types of aquifer were obtained (Fig. 5).
It can be seen from the results of the R-type cluster analysis that the degree of similarity is high between Ca2+ and Mg2+ and between Cl− and CO32− in the fourth aquifer. In addition, there is a relationship between certain ions in the coal-bearing sandstone aquifer, Ca2+, Mg2+ and SO42− are closely related, as are K+ + Na+ and Cl−. The cause of this phenomenon is the origin of the samples: Ca2+, Mg2+ and SO42− were derived from the dissolution of sulfate rocks, and K+ + Na+ and Cl− came from soluble sodium–potassium salt rocks. In the limestone aquifer, ions such as Ca2+ and Mg2+ as well as K+ + Na+ and Cl− are also closely related. The reason being that Ca2+ and Mg2+ came from the partial dissolution of insoluble carbonate rocks. However, the relationship between CO32−, Ca2+ and Mg2+ was relatively small because of the reaction of CO32− + H2O ⇌ HCO3− + OH− proceeding in the positive direction when the concentration of CO32− is increased. Thus, the concentration of CO32− in the groundwater decreased, and the relationship between Ca2+ and Mg2+ was low. It is consistent with the alkalinity of the water samples from the limestone-karst fissure aquifer in the Taiyuan formation of the Carboniferous.
The ions in coal mine water have certain internal connections between them, and these inherent connections were often ignored in the process of establishing recognition models of mine water inrush sources, which led to excessive deviation in the practical applications of recognition models for water inrush sources. This deviation has brought a series of serious influences on the actual production of coal mines. To reduce this deviation, the training samples were preprocessed using the method of factor analysis. Finally, the recognition model was established.
Principal component analysis of the training samples
To verify the results of the R-type cluster analysis, a Pearson correlation analysis was conducted on the training samples (Chen et al. 2013; Huang and Wang 2018; Kim et al. 2005; Qian et al. 2016), and the Pearson correction coefficient of the three types of water samples were then obtained (Tables 2, 3, 4).
From Tables 2, 3 and 4, we could see that the correlations between some ions in each aquifer were remarkable (Qian et al. 2016). In the water sample of the fourth aquifer, the concentrations of Ca2+ and Mg2+ were positively correlated (r = 0.767, p < 0.01), Cl− and CO32− were significantly correlated (r = 0.971, p < 0.01); In the water sample of the coal-bearing sandstone aquifer, both Ca2+ and Mg2+ were moderately correlated with SO42− (Ca2+ vs. SO42−: r = 0.399, p < 0.05; Mg2+ vs. SO42−: r = 0.359, p < 0.05; Table 3), and K+ + Na+ showed positive correlations with SO42− (r = 0.481, p < 0.01). In addition, Ca2+ and Mg2+ were also significant correlations (r = 0.877, p < 0.01) in the water sample of the limestone aquifer. Comparing the results of the R-type cluster analysis with the Pearson correlation coefficient, the correlation between the ions of each aquifer was basically consistent. It was, therefore, fully suggested that there is an internal connection between the ions in coal mine water.
To solve any problems with the connections among internal ions, the factor analysis of the training samples was then processed using SPSS. We used the principal component analysis to reduce the number of factors to 7, and the 7 original factors were then combined into 3 independent indicators to reflect the hydrochemical information.
Using the principal component analysis in factor analysis, the initial factors were extracted from the ion’s correction coefficient matrix, and the initial eigenvalue and the variances explained by the principal component analysis were obtained (Table 5).
The number of principle components could be determined by the cumulative variance of the principle components. It is generally thought that the cumulative variance of extracting principal components is more than 80%, which means that the selected number of principal components can fully reflect the hydrochemical information of the training samples (Chen et al. 2013; Wang et al. 2017a, b; Yin et al. 2006; Zhang et al. 2017). Therefore, we extracted three principal components, which were consistent with the results of selecting the number of principal components according to the eigenvalues. To some extent, the number of principal components could be determined using eigenvalues greater than 1 as criteria. The eigenvalues of the principal components are shown in Fig. 6.
The maximum variance algorithm was used for the orthogonal rotation of the initial load matrix of factors so that loads of each ion on the same factor were distinctly different. The orthogonal rotation converges after 6 iterations, and the orthogonal rotation factor loading matrix (Table 6) and the orthogonal rotation factor loading diagram (Fig. 7) could then be obtained. After the orthogonal rotation of three types of water samples, each principal component represented the hydrochemical information of different ions. Among them, principal component 1 represented Ca2+, Mg2+ and SO42− and reflected the information of 42.637% of the training samples. Principal component 2 represented Cl− and CO32− and reflected the information of 25.347% of the training samples. Principal component 3 represented K+ + Na+ and reflected the information of 14.661% of the training samples.
Principal component analysis of the 82 training water samples was carried out using SPSS, and we obtained scores for three principal components from the 82 training water samples (Table 7). The scores of the principal components were expressed as Y1, Y2 and Y3, respectively. The correlation coefficients between the three types of principal components and the original variables are shown in Table 8.
According to the principal component score coefficients, we could get the expression of principal component scores, relational expressions of the three extracted principal components Y1, Y2 and Y3 with the original variables X1, X2, X3, X4, X5, X6 and X7 were obtained as follows:
where Y1, Y2 and Y3 represent the scores of principal component 1, principal component 2 and principal component 3 of the training samples; X1, X2, X3, X4, X5, X6 and X7 represent the contents of K+ + Na+, Ca2+, Mg2+, Cl−, SO42−, HCO3− and CO32−, respectively.
Construction of the recognition model
We regarded the principal component scores Y1, Y2 and Y3 of the 82 training water samples as independent variables for implementing the multiple logistic regression recognition analysis. The parameters of the multiple logistic regression recognition model are shown in Table 9.
The recognition function of the solution is as follows:
Formula (6) could be simplified using Formula (5). We were then able to derive the following Formula (7).
Finally, the expressions for the probability functions of the three types of water inrush sources are as follows:
where P1, P2 and P3 are the probability functions of the respective recognition models of types 1, 2, and 3; X1, X2, X3, X4, X5, X6 and X7 represent the contents of K+ + Na+, Ca2+, Mg2+, Cl−, SO42−, HCO3− and CO32−, respectively; and the final item of the discriminant function is a constant.
Verification of water inrush source recognition model
The 82 groups of training samples in Table 7 were integrated into the established multiple logistic regression recognition model based on cluster analysis one by one for cross-validation (Table 10). The results showed that all water samples were discriminated with a discrimination rate of 87.8%. Among them, the recognition accuracy of water samples from the fourth aquifer is 42.8%, the recognition accuracy of water samples from the coal-bearing sandstone aquifer is 96.9% and the recognition accuracy of water samples from the limestone aquifer is 88.1%. The reason for the difference in the recognition accuracy of various aquifers lies in the difference in the number of training water samples. Because coal mining is less threatened by water inrush from the fourth aquifer, the limited number of water samples were collected from the fourth aquifer. However, the recognition model is established based on a certain amount of water samples. Therefore, the recognition accuracy of water samples from the fourth aquifer significantly different from the coal-bearing sandstone aquifer and the limestone aquifer. Meanwhile, this result can be compared to the traditional multiple logistic regression recognition model, which incurred multiple errors in its rediscrimination steps and had a correct discrimination rate of less than 78.5%. Therefore, the multiple logistic regression recognition model based on cluster analysis was more accurate, had a higher degree of stability, and could meet the actual requirements of water inrush source recognition.
In addition, to further verify the accuracy of the established multiple logistic regression recognition model based on cluster analysis, 16 water samples to be discriminated from the Qinan mining area were substituted into the multiple logistic regression recognition model for discrimination (Table 11). Table 11 shows that 16 water samples are classified accurately by the established multiple logistic regression recognition model based on cluster analysis and only one sample is wrongly discriminated, showing an accuracy of 93.8%. Water sample A2 is actually the fourth aquifer water sample, but it is discriminated as the coal-bearing sandstone aquifer water sample in the model. Through comprehensive comparison, the multiple logistic regression recognition model based on cluster analysis was seen to be more accurate and to have greater extensive applicability than those of the traditional multiple logistic regression recognition model. Therefore, the multiple logistic regression recognition model based on cluster analysis has significant engineering relevance.
Results and discussion
Based on the hydrogeological conditions of the mining area, cluster analysis of water quality samples was carried out in this paper. The analysis results were then utilized to analyze and extract typical water samples. At last, the multiple logistic regression recognition model based on cluster analysis was established. According to the results of the model recognition and the engineering application, the conclusions were drawn as follows:
-
1.
Through the cluster analysis of the original water samples, the nonconforming water samples were eliminated. The 82 water samples that accurately reflect the hydrochemical characteristics of the water inrush aquifer were screened from 93 original water samples, and they were used as training samples to establish the recognition model, which reduced the influence of the errors caused by the water quality analysis on the accuracy of the mode.
-
2.
In the process of establishing recognition model, to eliminate the internal connections between the ions, this paper adopted the principal component analysis method to cut down the dimension of the initial seven types of variables and combine the original seven factors into a few independent indexes to comprehensively reflect the hydrochemical information.
-
3.
The overall recognition accuracy of the multiple logistic regression recognition model based on cluster analysis reaches 87.8% and has high accuracy. It is easy to operate in the actual water source discrimination process, with straightforward discrimination results. This recognition model provides a new way to discriminate mine water inrush sources and has important guiding significance for mine water prevention and control work.
-
4.
Because the recognition model is based on hydrological data from a certain amount. And the quantity of water sample has certain influence on the accuracy of the recognition model. Therefore, we should collect more water sample data to improve accuracy. In addition, given the complexity of hydrogeological conditions, temperature, and human activities on aquifers in the study area, future studies should fully consider the impact of these factories to promote the applications of the model.
References
Biswas A, Sharma SP (2017) Geophysical surveys for identifying source and pathways of subsurface water inflow at the Bangur chromite mine, Odisha, India. Nat Hazards 88(2):947–964
Bu HM, Tan X, Li SY, Zhang QF (2010) Water quality assessment of the Jinshui River (China) using multivariate statistical techniques. Environ Earth Sci 60(8):1631–1639
Chen HJ, Li XB, Liu AH, Peng SQ (2009) Identifying of mine water inrush sources by Fisher discriminant analysis method. J Cent South Univ 40:1114–1120
Chen LW, Yin XX, Liu X, Gui HR (2013) Multivariate statistical analysis on hydrochemical evolution of groundwater in the concealed coal mines in North China. Coal Geol Explor 41(6):43–51
Faghih Nasiri E, Yousefi Kebria D, Qaderi F (2018) An experimental study on the simultaneous phenol and chromium removal from water using titanium dioxide photocatalyst. Civ Eng J 4(3):585
Farnham IM, Stetzenbach KJ, Singh AK, Johannesson KH (2000) Deciphering groundwater flow systems in Oasis Valley, Nevada, using trace element chemistry, multivariate statistics, and geographical information system. Math Geosci 32(8):943–968
Ganyaglo SY, Banoeng-Yakubo B, Osae S, Dampare SB (2011) Water quality assessment of groundwater in some rock types in parts of the eastern region of Ghana. Environ Earth Sci 62:1055–1069
Gui HR, Lin ML (2016) Types of water hazards in China coalmines and regional characteristics. Nat Hazards 84(2):1501–1512
Güler C, Thyne GD, McCray JE, Turner KA (2002) Evaluation of graphical and multivariate statistical methods for classification of water chemistry data. Hydrogeol J 10(4):455–474
Hu W, Dong S, Yan L (2011) Water hazard control technology for safe extraction of coal resources influenced by faulted zone. Procedia Earth Planet Sci 3:1–10
Huang PH, Chen JS (2011) Fisher identify and mixing model based on multivariate statistical analysis of mine water inrush sources. J China Coal Soc 36(S1):131–136
Huang PH, Wang XY (2018) Piper-PCA-Fisher recognition model of water inrush source: a case study of the Jiaozuo mining area. Geofluids 2018:1–10
Huang PH, Yang ZY, Wang XY, Ding FF (2019) Research on Piper-PCA-Bayes-LOOCV discrimination model of water inrush source in mines. Arab J Geosci 12:334
Jolliffe IT (2002) Principal component analysis. Wiley, Hoboken
Keskin TE, Düğenci M, Kaçaroğlu F (2015) Prediction of water pollution sources using artificial neural networks in the study areas of Sivas, Karabük and Bartın (Turkey). Environ Earth Sci 73(9):5333–5347
Kim JH, Kim RH, Lee J, Cheong TJ, Yum BW, Chang HW (2005) Multivariate statistical analysis to identify the major factors governing groundwater quality in the coastal area of Kimje. South Korea. Hydrol Process 19(6):1261–1276
Li GQ, Meng ZP, Wang XQ, Yang J (2017) Hydrochemical prediction of mine water inrush at the Xinli Mine, China. Mine Water Environ 36(1):78–86
Liu X, Chen LW, Lin ML, Li SD (2013) Fisher recognition analysis for coal mining inrush water source under mining-induced disturbance and inversion of groundwater recharge relation. Hydrol Eng Geol 40(4):36–43
Liu Q, Sun YJ, Xu ZM, Xu G (2018) Application of the comprehensive identification model in analyzing the source of water inrush. Arab J Geosci 11(9):189
Lu JT, Li XB, Gong FQ (2012) Recognizing of mine water inrush sources based on principal components analysis and fisher discrimination analysis method. China Saf Sci J 22(7):109–115
Meglen RR (1992) Examining large databases: a chemometric approach using principal components analysis. Mar Chem 39(1):217–237
Panagopoulos GP, Angelopoulou D, Tzirtzilakis EE, Giannoulopoulos P (2016) The contribution of cluster and discriminant analysis to the classification of complex aquifer systems. Environ Monit Assess 188:591
Qian J, Wang L, Ma L, Lu YH, Zhao WD, Zhang Y (2016) Multivariate statistical analysis of water chemistry in evaluating groundwater geochemical evolution and aquifer connectivity near a large coal mine, Anhui, China. Environ Earth Sci 75(9):747
Reghunath R, Murthy TRS, Raghavan BR (2002) The utility of multivariate statistical techniques in hydrogeochemical studies: an example from Karnataka, India. Water Res 36(10):2437–2442
Wang LB (2010) Multivariate statistical analysis: models, case study and application of SPSS. Economic Science Press, Beijing
Wang JC, Guo ZG (2001) Logistic regression model-methods and applications. Higher Education Press, Beijing
Wang XY, Zhao W, Liu XM, Wang TT, Zhang JG, Guo JW, Chen GS, Zhang B (2017a) Identification of water inrush source from coalfield based on entropy weight-fuzzy variable set theory. J China Coal Soc 42(9):2433–2439
Wang Y, Zhou MR, Yan PC, He CY, Liu D (2017b) Identification of coalmine water inrush source with PCA-BP model based on laser-induced fluorescence technology. Spectrosc Spectr Anal 37(3):978–983
Wei WX, Han J, Shi LQ, Lu XM, Zhang XJ (2015) Application of modern data analysis in mine water gushing prediction. Coal Industry Press, Beijing
Wu Q, Guo XM, Shen JJ, Xu S, Liu SQ, Zeng YF (2016) Risk assessment of water inrush from aquifers underlying the Gushuyuan coal mine, China. Mine Water Environ 36(1):1–8
Xu B, Zhang Y, Jiang L (2012) Coupled model based on grey relational analysis and stepwise discriminant analysis for water source. Rock Soil Mech 33(10):3122–3228
Yin XX, Xu GQ, Gui HR, Chen LW (2006) Analyzing for sources of inrush-water in Wanbei Mining Area by systemic clustering and stepwise distinguishing. Coal Geol Explor 34(2):61–64
Zhang WT (2002) SPSS 11.0 statistical analysis tutorial (advanced). Beijing Hope Electronic Press, Beijing
Zhang H, Yao DX, Lu HF, Zhu NN, Xue L (2017) Application of principal component analysis and bayes discrimination approach in water source identification. Coal Geol Explor 45:87–93
Acknowledgements
The project was supported by the National Natural Science Foundation of China (Grant nos. 41672273, 51474008), the Fundamental Research Funds for the Central Universities (22120180313) and the Anhui Natural Science Foundation of China (1508085QE89). The research was also substantially supported by the Key Laboratory of Geotechnical and Underground Engineering of Ministry of Education (Tongji University).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, H., Xing, H., Yao, D. et al. The multiple logistic regression recognition model for mine water inrush source based on cluster analysis. Environ Earth Sci 78, 612 (2019). https://doi.org/10.1007/s12665-019-8624-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12665-019-8624-2