Introduction

Coal is the most abundant and most widely distributed fossil fuel on the earth. As an important power for the rapid development of the global economy, coal resources are increasingly needed (Wang et al. 2015a, b). Meanwhile, the hydrogeological problems facing coal mining become increasingly prominent. Mine water inflow refers to the quantity of surface water or groundwater pouring into a well lane system within unit time through fissures, faults or other channels in the process of mining (Xu and Gong 2011; Dong et al. 2021). In order to ensure the production safety, mine water inflow prediction is one of the priorities in water hazards prevention and control in both the prospecting stage and the mine construction and production stage (Wu et al. 2013; Singh and Atkins 1985). Mine water inflow is directly related to the rationality of coal mining scheme and drainage capacity designs, and more importantly, determines whether coal mining is safe (Li et al. 2022a; Hu and Zhao 2021; Polak et al. 2016). In order to make the prediction results of mine water inflow more consistent with reality, many researchers have conducted a lot of studies on mine water inflow prediction methods, establishing two types of prediction methods: uncertainty analysis methods and deterministic mathematical models. The uncertainty methods mainly include correlation analysis (Qiu et al. 2020), support vector machine (Li et al. 2010), neural network (Zuo et al. 2011) and grey system theory (Wang et al. 2015a, b; Ma and Bai 2015; Xu et al. 2012). The deterministic methods mainly include numerical method (Li et al. 2015; Wu et al. 2019; Bai et al. 2021; Krukovska and Vynohradov 2019; Bouw and Morton 1987), and analytical method (Hou 2012; Li et al. 2014). Miladinović et al. (2015) used a linear correlation regression model to prediction and correction on the mine water inflow of the Štavalj Coal Mine in southwestern Serbia. Wei et al. (2011) built a water inflow prediction model based on support vector machines, combined with the practical demands to predict the water inflow at the new working face, and the results were verified during the mining process of the new working face. Shao et al. (2014) established a non-linear artificial neural network prediction model and predicted the normal mine water inflow during mine operation, and the results were consistent with the actual predicted data. Ma et al. (2020) established two exponentially weighted moving average modified gray water inflow models optimized by particle swarm optimization, and obtained water inflow prediction equations based on actual data from Buliangou coal mine. Guo et al. (2009) established a three-dimensional numerical model of COSFLOW and simulated the water inflow of two mines in Australia. Singh et al. (2012) used the SEEP/W finite element software package to predict the water inflow from surface mining excavation, and compared the prediction results with the analytical solution for verification. Li et al. (2021) combined Monte Carlo methods and FLAC3D to generate a discrete fracture network, established an optimized water inflow prediction model based on the fluid–solid coupling method, and proposed an analytical formula for water inflow prediction. Chen et al. (2015) used the big water well method and a three-dimensional numerical model to predict the water inflow from the roof sandstone aquifer in coal seam mining, and compared and analyzed the results of the prediction methods. Zhang et al. (2017) used the no. 2 coal seam in the Pingdingshan No. 10 coal mine as an example, three methods (analogue, big well, and numerical simulation) were used to forecast mine water inflow and their performance. The existing mine water inrush prediction methods generally emphasize a single factor and do not consider multiple factors that affect the occurrence and control of mine water inrush. They also do not integrate the mechanisms, positions and progress of mine water inrush into the mining engineering process, resulting in a prediction of the mine water inrush as a static quantity instead of a dynamic changing quantity that changes with the mining engineering. In recent years, with the complexity of global mining conditions, especially the increasing depth of mining, the complexity of mine hydrological and geological conditions often exceeds the empirical range, making some methods no longer applicable. Based on the analysis of typical coal mine hydrological and geological conditions, this paper determines the main factors affecting the mine water inrush. Using multivariate regression theory and MATLAB function programming, a multivariate nonlinear regression fitting of the mine water inrush and various factors was carried out, combined with the factor weight determined by the entropy value method, and a weighted mine water inrush multivariate nonlinear regression prediction model was constructed. The established prediction model compensates for the flaw of previous prediction methods that did not consider the differences in the importance of each factor and can minimize the prediction error caused by low survey level or lack of hydrological and geological parameters.

Overview of the research area

Physical geography

Wulunshan Coal Mine is located in Shuguang Town, Nayong County, Guizhou Province, China. Its geographic coordinates are 105°16′01″ ~ 105°20′35″E and 26°34′59″ ~ 26°40′15″N. The mine lot is 9.6 km long from south to north, and 4.6 km wide on average, covering an area of 44.02km2 (the geographic location is shown in Fig. 1). The Wulunshan Coal Mine is characterized by a plateau-middle mountainous topography, with an elevation ranging from 1,500 m to 2,000 m and a relative height difference of 300 m to 500 m. The terrain is overall higher in the northwest and lower in the southeast. The dominant landform includes river valleys and gullies. The main rivers within the mine area include the Shuigong River, the Sancha River, and the gullies on either side, which are distributed in a branching pattern and belong to the Wujiang River System.

Fig. 1
figure 1

Location of Wulunshan Coal Mine

Geological condition

  1. (1)

    Strata and coal seam

The disclosed strata of the mine lot are sorted from old to new: Middle Permian Maokou Formation (P2m), Upper Permian Emeishan Basalt Formation (P3β), Upper Permian Longtan Formation (P3l), Upper Permian Changxing Formation (P3c), Lower Triassic Feixianguan Formation (T1f), Lower Triassic Yongningzhen Formation (T1yn) and Quaternary (Q). The coal-containing strata are Longtan Formation, composed of siltstone, fine sandstone, bioclastic limestone, mudstone, and coalbed. The potential mining strata are No.3, No.5–3, No.6–3, No.8, and No.33, and the major mining strata are No.3 and No.8 (see Fig. 2).

Fig. 2
figure 2

Bar graph of the relationship between strata and coal seam

  1. (2)

    Geological structure

Wulunshan Coal Mine is located in the south section of Jiaga anticline and the west wing of the Shuigonghe syncline. The whole is a monoclinic structure, superposition of secondary anticlines and synclines. The stratum towards is 130 ~ 160°, leaning to the northeast. The shallow dip angle is sharp, 25 ~ 40°; the deep dip angle gradually becomes gentle, 5 ~ 20°. The faults within the mining area are mainly high angle normal faults with an inclination of 66 ~ 80°and fault throw of about 10 ~ 25 m (see Fig. 3).

Fig. 3
figure 3

Distribution of geological structure

Hydrogeological condition

The main groundwater source in the studied area is rainfall, and the groundwater level is controlled by terrain and rainfall. The water can be divided into carbonate karst water, clasolite fissure water and loose rock pore water. No.2 (T1f2)and No.4(T1f4) sections of Feixianguan Formation and Maokou Formation(P2m) are thick limestone strata, which feature evident dissolution, developed karst caves, dissolution pores and grikes, and strong water yield capacity. As a result of the Zk1 drilling water pumping test, the unit water inflow is 0.052L/(s m), the water temperature is 13 ~ 16℃, and the pH is 6.8 ~ 7.74. The rest of the clasolite strata feature thin limestone, underdeveloped karst and weak water yield capacity. The scattered Quaternary (Q) residual diluvial loose rock formations feature strong water permeability and moderate water yield capacity. The direct sources of water filling for coal mining are the fissured aquifers of Longtan Formation (P3l) and Changxing Formation (P3c), and the indirect sources of water filling are the karst water aquifers of Feixianguan Formation No.2 section (T1f2) and Maokou Formation (P2m). The hydrogeological profiles are shown in Fig. 4. In mines where the coal seams are surrounded by impermeable rock formations, water inflow can occur through fractures and faults in the rock, which can serve as pathways for water to enter the mine. If the water is under high pressure, it can cause increased stress on the surrounding rock, leading to the development of new fractures or the expansion of existing ones. This can increase the risk of rock falls and other types of rock instability, which can compromise the safety of the mine and its workers.

Fig. 4
figure 4

Hydrogeological profiles

Model data for training and validation

In this study, the data used for model training and validation was collected from Wulunshan Coal Mine. The data used in this study was collected from 65 time points between 2013 and 2017 (Table S1). Of these 65 time points, 50 were used for training and 15 were used for validation. The data types included in this study are precipitation, aquifer thickness, mining area, mining depth, mining thickness, driving footage, and water inflow. These data types were selected for their potential impact on water inflow into the mine and their contribution to the overall understanding of the system. The collection of this data allowed for the creation of models that can be used to make predictions about water inflow into the mine.

Correlation analysis and weight determination of the influencing factors

Mine water inflow is closely related to geological structure, engineering, hydrogeological condition, etc. Moreover the relationships between water inflow and the influencing factors often present a highly nonlinear complex relationship (Qiu et al. 2017; Liu et al. 2018; Li and Zhou 2006; Shi et al. 2017). Predecessors have done a lot of related research on this, representative as Wu et al. (2017) used the vulnerability index method, which couples GIS with the analytic hierarchy process, to evaluate the water inrush risk of the Gushuyuan coal mine No. 15 seam. Li et al. (2022b) used grey relational analysis and analytic hierarchy process to establish an evaluation model for water inrush from the coal floor. They applied it to the typical working face of Yuzhou coalfield in north China to demonstrate the evaluation process. This paper, based on the results of previous studies and the hydrogeological and geological conditions of Wulunshan Coal Mine, determines the influencing factors of water inflow are precipitation, aquifer thickness, mining area, mining depth, mining thickness and driving footage. Besides, the author collected 65 groups of monthly water inflow data from the Wulunshan Coal Mine from July 2012 to November 2017, 50 groups of which serve as training samples for the prediction model, and 15 of which are used for testing the prediction results of the prediction model, as shown in Table S1 (supplementary material).

Correlation analysis theory

In the early twentieth century, the British statistician Pearson put forward a coefficient for calculating the linear correlation between two variables, called Pearson’s correlation coefficient (Gross 1975; Katsaounis 2004). The coefficient is hereby used to analyze the correlation between mine water inflow and each influencing factor, as shown in Eq. (1) (Fiorillo and Doglioni 2010; Liu et al. 2019a, b).

$$r=\frac{\sum_{i=1}^{n}({x}_{i}-\frac{1}{n}\sum_{i=1}^{n}{x}_{i})({y}_{i}-\frac{1}{n}\sum_{i=1}^{n}{y}_{i})}{\sqrt{{\sum_{i=1}^{n}({x}_{i}-\frac{1}{n}\sum_{i=1}^{n}{x}_{i})}^{2}}\sqrt{{\sum_{i=1}^{n}({y}_{i}-\frac{1}{n}\sum_{i=1}^{n}{y}_{i})}^{2}}},{r}^{2}\le 1$$
(1)

Where xi is mine water inflow; yi is a factor affecting mine water inflow; r is the correlation coefficient.

Based on 65 groups of measured data about the mine water inflow of Wulunshan Coal Mine from July 2012 to November 2017, a correlation analysis is conducted between mine water inflow and each influencing factor. The results are shown in Table 1.

Table 1 Correlation coefficient matrix of influencing factors

According to Table 1, the correlation coefficients between mine water inflow and influencing factors are between -0.76 and 0.61. Mine water inflow is positively correlated to precipitation, aquifer thickness, mining area, mining depth and mining thickness, and negatively correlated to driving footage.

Entropy method theory

The entropy method is an objective assignment method. Each influencing factor is weighted by judging the dispersion degree of an influencing factor and the degree of deviation between data and the characteristics of the data itself (Xue et al. 2021; Xu et al. 2020). There are three main steps to determine the weight of an influencing factor using the entropy method:

The raw data matrix is normalized. Suppose the original data matrix of m influencing factors and n months is \(A={[{a}_{ij}]}_{m\times n}\), standardize it to get Eq. (2) for influence factors positively correlated with mine water inflow, and Eq. (3) for influence factors negatively correlated with mine water inflow:

$${r}_{ij}=\frac{{a}_{ij}-\underset{j}{\mathrm{min}\{{a}_{ij}\}}}{\underset{j}{\mathrm{max}\{{a}_{ij}\}}-\underset{j}{\mathrm{min}\{{a}_{ij}\}}}$$
(2)
$${r}_{ij}=\frac{\underset{j}{\mathrm{max}\{{a}_{ij}\}}-{a}_{ij}}{\underset{j}{\mathrm{max}\{{a}_{ij}\}}-\underset{j}{\mathrm{min}\{{a}_{ij}\}}}$$
(3)

Where \(\underset{j}{\mathrm{min}\{{a}_{ij}\}}\) is the minimum value of the ith influencing factor; \(\underset{j}{\mathrm{max}\{{a}_{ij}\}}\) is the maximum value of the ith influencing factor.

Information entropy is defined. The information entropy of the ith influencing factor is as shown in Eq. (4).

$${E}_{i}=-\frac{1}{\mathrm{ln}n}\sum_{j=1}^{n}(\frac{{r}_{ij}}{\sum_{j=1}^{n}{r}_{ij}})\mathrm{ln}(\frac{{r}_{ij}}{\sum_{j=1}^{n}{r}_{ij}})$$
(4)

Where n is the total number of months; rij is the standard value of the ith influencing factor in the jth month.

The weight is defined. The weight of the ith influencing factor is as shown in Eq. (5).

$$Wi=\frac{1-Ei}{m-\sum_{i=1}^{m}Ei}$$
(5)

where \(0\le {w}_{i}\le 1\) and \(\sum_{i=1}^{m}{w}_{i}\text{=}1\); m is the total number of influence factors; Ei is the information entropy of the ith influence factor.

The weights of factors influencing are determined using entropy method, as shown in Table 2.

Table 2 The weights of influencing factors

It can be concluded from Table 2 that the influencing factors of mine water inflow for Wulunshan Coal Mine can be sorted by weight: precipitation > mining area > aquifer thickness > mining thickness > mining depth > driving footage.

Building of a weighted multiple nonlinear regression prediction model for mine water inflow

Building of multiple linear regression prediction model

Multiple regression analysis studies the relationship between one dependent variable and multiple independent variables based on the given values of multiple explanatory variables (Cohen 1968; Liu et al. 2019a, b). The functional expression of multiple regression analysis can be either linear or nonlinear depending on the causal relationship between the independent or dependent variable (Ouedraogo et al. 2019). MATLAB function programming is used to realize multiple linear fitting between water inflow and precipitation, aquifer thickness, mining area, mining depth, mining thickness and driving footage. The fitting parameters are shown in Table 3.

Table 3 Fitting parameters for multiple linear regression

According to Table 3, the multiple linear regression coefficients between water inflow and influencing factors are 0.05, 0.04, 1.21, 0.02, 12.43, and -0.01. The equation of the water inflow multiple linear regression prediction model is therefore obtained, as shown in Eq. (6).

$$Q=0.05P+0.04M+1.21A+0.02D+12.43T-0.01L-23.15$$
(6)

Where Q is water inflow(m3/h); P is precipitation(mm); M is aquifer thickness;(m) A is mining area(103m2);D is mining depth(m); T is mining thickness(m); L is driving footage(m).

Building of weighted multiple nonlinear regression prediction model

A scatter plot between influencing factors and water inflow shows a highly nonlinear relationship. MATLAB functional programming fitting is used to determine the unary nonlinear fitting curves between water inflow and influence factors. As shown in Figs. 5, 6, 7, 8, 9 and 10.

Fig. 5
figure 5

Unary nonlinear fitting curve between precipitation and water inflow

Fig. 6
figure 6

Unary nonlinear fitting curve between aquifer thickness and water inflow

Fig. 7
figure 7

Unary nonlinear fitting curve between mining area and water inflow

Fig. 8
figure 8

Unary nonlinear fitting curve between mining thickness and water inflow

Fig. 9
figure 9

Unary nonlinear fitting curve between mining depth and water inflow

Fig. 10
figure 10

Unary nonlinear fitting curve between driving footage and water inflow

Based on the MATLAB function fitting curves, the influencing factor function fitting indicators are Sum of Squares due to Error (SSE), Coefficient of Determination (R2), Adjusted Coefficient of Determination (Adjusted R2), and Root Mean Square Error (RMSE). The function fitting indicators are shown in Table 4.

Table 4 Function fitting indicators

The unary nonlinear regression function relational expressions between water inflow and influencing factors are determined by function fitting indicators and function fitting curves. As shown in Eq. (7) to Eq. (12).

$$Q={a}_{1}{P}^{3}+{a}_{2}{P}^{2}+{a}_{3}P+{a}_{4}$$
(7)
$$Q={b}_{1}\mathrm{sin}({b}_{2}M+{b}_{3})$$
(8)
$$Q=\frac{{c}_{1}A+{c}_{2}}{{A}^{2}+{c}_{3}A+{c}_{4}}$$
(9)
$$Q={d}_{1}+{d}_{2}\mathrm{cos}(\lambda D)+{d}_{3}\mathrm{sin}(\lambda D)$$
(10)
$$Q={e}_{1}{T}^{3}+{e}_{2}{T}^{2}+{e}_{3}T+{e}_{4}$$
(11)
$$Q={f}_{1}{L}^{3}+{f}_{2}{L}^{2}+{f}_{3}L+{f}_{4}$$
(12)

Where Q is water inflow(m3/h); P is precipitation(mm); M is aquifer thickness (m); A is mining area (103m2); D is mining depth (m); T is mining thickness (m); L is driving footage(m); a, b, c, d, e, f and λ are parameters to be solved.

After obtaining the unary nonlinear regression function relationships, to fully consider the important difference of the influencing factors to water inflow, the unary nonlinear regression function relationships of the influencing factors are weighted and summed. As shown in Eq. (13).

$$\begin{array}{c}Q=0.36({a}_{1}{P}^{3}+{a}_{2}{P}^{2}+{a}_{3}P+{a}_{4})+0.14({b}_{1}sin({b}_{2}M+{b}_{3}))+0.27(\frac{{c}_{1}A+{c}_{2}}{{A}^{2}+{c}_{3}A+{c}_{4}})\\ +0.06({d}_{1}+{d}_{2}cos(\lambda D)+{d}_{3}sin(\lambda D))+0.13({e}_{1}{T}^{3}+{e}_{2}{T}^{2}+{e}_{3}T+{e}_{4})+0.04({f}_{1}{L}^{3}+{f}_{2}{L}^{2}+{f}_{3}L+{f}_{4})\end{array}$$
(13)

Where Q is water inflow(m3/h); P is precipitation(mm); M is aquifer thickness;(m) A is mining area(103m2);D is mining depth(m); T is mining thickness(m); L is driving footage(m); a, b, c, d, e, f and λ are parameters to be solved.

Using statistical analysis, a weighted multiple nonlinear regression prediction model is established when the residual sum of squares of the fitted value and the measured value is minimum. The parameter estimations for the influencing factors are shown in Table 5.

Table 5 Parameter estimations of the multiple nonlinear fitting function

The fitting parameter estimations of influencing factors are substituted into Eq. (13) to get the weighted multiple nonlinear regression prediction model equation, as shown in Eq. (14).

$$\begin{array}{c}Q=0.16P+0.99sin(0.73M+3.6)+\frac{1.99A-0.04}{{(A-0.68)}^{2}-0.29}\\ +0.08cos (0.95D)+2.69sin(0.95D)\\ +1.29T({(}^{T}+0.49)+0.07L-12.92\end{array}$$
(14)

Where Q is water inflow(m3/h); P is precipitation(mm); M is aquifer thickness (m); A is mining area(103m2); D is mining depth(m); T is mining thickness(m); L is driving footage (m).

Result

In order to evaluate the accuracy of the prediction results of the prediction model proposed herein, Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE) are selected to compare the prediction results of the multiple linear regression prediction model, and the weighted multiple nonlinear regression prediction model with the measured values of mine water inflow. The error calculation formulas are Eqs. (15) and (16).

$$MAPE=\frac{100\%}{n}\sum_{i=1}^{n}|\frac{\widehat{{y}_{i}}-{y}_{i}}{{y}_{i}}|$$
(15)
$$RMSE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}{(\widehat{{y}_{i}}-{y}_{i})}^{2}}$$
(16)

Where \(\widehat{{y_{i} }}\) is the predicted value; yi is the measured value; n is the number of samples.

The test sample data in Table S1 (supplementary material) are respectively substituted into the multiple linear regression prediction model Eq. (6), and the weighted multiple nonlinear regression prediction model Eq. (14) to obtain the water inflow prediction values of the two prediction models, as shown in Table 6.

Table 6 The water inflow prediction values of the two prediction models

According to the measured values and predicted values in Table 6, comparative curves for the two prediction models are made, as shown in Fig. 11.

Fig. 11
figure 11

Comparative curves for the two prediction models

The measured and predicted values in Table 6 are substituted into Eqs. (15) and (16) to produce the error analysis tables for the two prediction models, as shown in Table 7.

Table 7 Error analysis of predicted values

According to Fig. 11, the predicted values of the multiple linear regression prediction model are basically larger than the measured values, and the overall trend of the prediction results is not consistent with the measured trend. However, the predicted values of the weighted multiple nonlinear regression prediction model agree well with the measured values, and the overall trend of the predicted results is almost in line with the measured trend. On the other hand, we found that the model fit accuracy is poor when predicting high water inflow. This may be due to the lack of extreme water inflow data under extreme conditions in our training samples, leading to the neglect of some extreme situations during training. The error analysis results in Table 7 show that the MAPE of the weighted multiple nonlinear prediction model is 16.44%, a great improvement compared to 45.88% of the multiple linear regression prediction model. The RMSE of the weighted multiple nonlinear prediction model is only 4.67, significantly lower than that of the multiple linear regression model.

Conclusion

This article analyzes the typical hydrogeological conditions in coal mines and, based on previous research, determines the main factors affecting the mine water flow to be rainfall, thickness of the aquifer, mining area, mining depth, mining thickness, and excavation length. Through correlation analysis, it is further concluded that the excavation length has a negative correlation with mine water flow, while the other factors have a positive correlation. Based on entropy value calculation, the weight of the impact factors on mine water flow is sorted as follows: rainfall > mining area > aquifer thickness > mining thickness > mining depth > excavation length. Using multivariate regression theory, scatter analysis and MATLAB function programming, the article builds a weighted multivariate non-linear regression model for predicting mine water flow based on the calculated factor weights. This model considers both the impact of multiple factors on mine water flow and the differences in factor importance. The comparison analysis of the weighted multivariate non-linear regression model, the multivariate linear regression model, and the measured values of water flow shows that the newly established water flow prediction model can overcome the shortcomings of existing methods, minimize the prediction error caused by low hydrogeological survey degree, and improve the prediction accuracy.