Introduction

Coal is an important primary energy source and underpins the development of China’s economy and society. However, coal mining and associated activities can have serious ecological and environmental impacts that are particularly problematic in environmentally sensitive areas. The Yulin, Shenmu, and Fugu mine areas in northern Shaanxi Province (i.e., the Yushenfu mine area) (Song 2009; Dou 2011, 2012) in China are located on the border between the Loess Plateau and the Maowusu Desert. This is one of the most important large-scale coal mining areas in China, which produces 100 Mt/a of coal. The Yushenfu mine area is an environmentally sensitive region characterized by arid and semi-arid conditions, desertification, soil erosion, and scarce water resources. An important research question concerns how large-scale coal mining impacts the local environment and ecology, and how these impacts might be better classified and quantified.

The prediction and evaluation of mine-associated environmental impacts are important areas of research worldwide, and a number of approaches have been developed to undertake such research, including fuzzy mathematics (Liao and Wu 2004), analytical hierarchy processes (Chen et al. 2006), matter-element analysis (Feng et al. 2009), the back production (BP) neural network method (Li et al. 2015), the comprehensive index method (Lei et al. 2005), support vector machine (SVM) modeling (Dou 2011, 2012), the geographic information system (GIS) method (Shan 2013), and Fisher discriminant analysis (FDA) (Zhang et al. 2014). These various approaches each have unique methodologies and limitations. For example, the fuzzy mathematics method does not meet “non-negative bounded, additive, and polarity” measurement standards; the max–min principle has deficiencies related to the subjectivity of the weightings used in the expert scoring method (Cao et al. 2007); neural networks are prone to local optimality and have limited ability to solve small sample problems; and kernel functions and parameter selection for SVM models are complex (Shao and Xu 2015). Notably, most of these methods do not consider the problem of misjudgment related to the superposition of information from different discriminant indexes.

Given these limitations, we used principal component analysis (PCA) to refine the discriminant index data that affect the environment quality in mines, and converted multiple interrelated index variables into new, independent sample indexes by linear combinations. The variations in the original discriminant indexes are effectively extracted and eliminate the effects caused by the superposition of information among the indexes. This results in a more robust characterization of different mine environments. The processed data are then subjected to Fisher discriminant analysis (FDA), which enables the environmental and ecological impacts of coal mining to be predicted. We applied this method to the Yushenfu mine area and show that it produces reliable results and thus is applicable to other coal mining areas in northwest China.

Methodology

PCA method

The PCA method is a data compression and feature information extraction technique. This method can effectively eliminate the correlation between high-dimensional datasets, reduce the data dimensions, and simplify the data structure. In the simplification process, multiple related variables are combined, based on the principle of minimizing information loss.

PCA mathematical model

The p variables X1, X2, ⋯, Xp of the original data matrix X are used for a linear combination, which is expressed as

$$ \left\{\ \begin{array}{l}\begin{array}{c}{Y}_1={a}_{11}{X}_1+{a}_{12}{X}_2+\cdots +{a}_{1p}{X}_p\\ {}{Y}_2={a}_{21}{X}_1+{a}_{22}{X}_2+\cdots +{a}_{2p}{X}_p\end{array}\\ {}\kern0.5em \vdots \kern6.5em \vdots \kern7.5em \vdots \\ {}{Y}_p={a}_{p1}{X}_1+{a}_{p2}{X}_2+\cdots +{a}_{pp}{X}_p\end{array}\right. $$
(1)

In Eq. (1), ai1 + ai2 + ⋯ + aip = 1, Yi, and Yj(i ≠ j; i, j = 1, 2, ⋯, p) are not related; Y1 is the largest variance combination in all linear combinations of (X1, X2, ⋯, Xp); and Y2 is the largest variance combination in all unrelated linear combinations of (X1, X2, ⋯, Xp). The variance sum of Y1, Y2, ⋯, Yp is equal to the variance sum of (X1, X2, ⋯, Xp).

Solving the principal component

The general steps for solving the principal component are as follows. (a) The original variable data are standardized, and then the covariance matrix ∑(Sij)p × p between the variables is calculated. (b) The eigenvalue λi of the covariance matrix ∑(Sij)p × p and the corresponding orthogonal unit eigenvector are calculated. The former m larger eigenvalues of ∑(Sij)p × p are λ1 ≥ λ2 ≥ ⋯ ≥ λm ≥ 0, which are the variances corresponding to the m principal components, and the unit eigenvector ai corresponding to λi is the original variable coefficient of the principal component Fi, which is expressed as follows:

$$ {a}_i={\lambda}_i/\sum \limits_{i=1}^p{\lambda}_i $$
(2)

The determination of the number of principal components generally depends on the cumulative variance contribution rate. When this rate exceeds 80%, the former m principal components in the data matrix are taken as the discriminant indexes (Lu et al. 2012; Qian et al. 2016).

FDA theory

The basic principle of FDA is that high-dimensional data points are projected onto low-dimensional space, such that the data points become more densely clustered. The projection is optimized to best discriminate classes of data, and the discriminant analysis function classifies these classes according to the largest distance between them and the smallest distance within a class. The model can then classify and discriminate data for new samples.

Solution of the FDA model

G1, G2, ⋯, Gk are a set of k populations, and the number of samples taken from it is n1, n2, ⋯, nk, respectively. In this case, n = n1 + n2 + ⋯ + nk and \( {x}_a^{(i)}=\left({x}_{a1}^{(i)},\cdots, {x}_{ap}^{(i)}\right) \) are the observation vector for sample a of population i. The discriminant function is set to \( y(x)={c}_1{x}_1+\cdots {c}_p{x}_p\underline {\underset{\_}{\Delta }}{c}^Tx \), wherec = (c1, ⋯, cp)T, x = (x1, ⋯, xp)T. \( {\overline{x}}^{(i)} \) and \( {\overline{s}}^{(i)} \) are the sample mean and covariance of x in population Gi, respectively, and the sample mean and variance of R(x) in Gi are \( {\overline{y}}^{(i)}={c}^T{\overline{x}}^{(i)},{\sigma}_i^2={c}^T{\overline{s}}^{(i)}c \). \( \overline{x} \) is the total mean vector, and it then follows that \( \overline{y}={c}^T\overline{x} \).

The Fisher judgment criterion uses a variance analysis to select the coefficient vector c, such that \( \lambda =\sum \limits_{i=1}^k{n}_i{\left({\overline{y}}^{(i)}-\overline{y}\right)}^2/\sum \limits_{i=1}^k\left({n}_i-1\right){\sigma}_i^2=\frac{c^T Bc}{c^T Ec} \) is maximized, where E is the sum of the squares of deviations within the group and B is the sum of the squares of deviations between groups, leading to \( E=\sum \limits_{i=1}^k{n}_i{s}^{(i)},B=\sum \limits_{i=1}^k{n}_i\left({\overline{x}}^{(i)}-\overline{x}\right){\left({\overline{x}}^{(i)}-\overline{x}\right)}^T \).

For the maximum value λ, given the necessary conditions for the existence of extreme values, ∂λ/∂c = 0 and Bc = λEc can be defined. Algebraically, the number m of non-zero eigenvalues for this formula cannot exceed min(k − 1, p) and, because B is non-negative and definite, the non-zero eigenvalue must be positive. As such, λ1 ≥ λ2 ≥ ⋯λm > 0 and m discriminant functions can be constructed, which are of the form yl(x) = (c(l))Tx(l = 1, ⋯, m). The first discriminant function is generally solved to obtain most of the sample information, but if it is difficult to discriminate, this can be considered in conjunction with subsequent discriminant functions.

Discriminating criterion

For each discriminant function, there is an index contribution rate \( {p}_l={\lambda}_l/\sum \limits_{i=1}^k{\lambda}_i \) that measures its discriminating ability. The cumulative discriminating ability of the previous r(r ≤ k) discriminant function is then defined as \( {p}_l=\sum \limits_{l=1}^r{\lambda}_l/\sum \limits_{i=1}^k{\lambda}_i \) and, if p > 0.85, then the r discriminant function can be used to discriminate. Two weighting methods can be used in the discrimination analysis, which are described as follows.

Unweighted method: If \( {\overline{y}}_l^{(i)}={\left({c}^{(l)}\right)}^T{\overline{x}}^{(i)}\left(l=1,\cdots, r;i=1,\cdots, k\right) \) and the unknown sample isx = (x1, x2, ⋯, xp)T, then \( {y}_l(x)={\left({c}^{(l)}\right)}^Tx,{D}_i^2=\sum \limits_{l=1}^r{\left[{y}_l(x)-{\overline{y}}_l^{(i)}\right]}^2\left(l=1,\cdots, r;i=1,\cdots, k\right) \). If\( {D}_r^2=\underset{1\le i\le k}{\min }{D}_i^2 \), then x ∈ Gr.

Weighting method: Considering that each discriminant function has a different discriminability, let\( {D}_i^2=\sum \limits_{l=1}^r{\left[{y}_l(x)-{\overline{y}}_l^{(i)}\right]}^2{\lambda}_l\left(l=1,\cdots, r;i=1,\cdots, k\right) \), where λl is the eigenvalue obtained by solving Bc = λEc. If \( {D}_r^2=\underset{1\le i\le k}{\min }{D}_i^2 \), then x ∈ Gr (Zhong and Guo 2008; Hu et al. 2010; Wang 2010; Zhou and Shi 2010; Fu and Wang 2012; Lu et al. 2012; Hou et al. 2016).

Previous studies have shown that FDA does not limit the populations and distribution (He 2004; Yuan and Song 2009; Hu et al. 2010; Wang 2010), provided the population mean and covariance matrix have been calculated and the population covariance matrix is reversible. In such cases, the statistical test can be omitted. In general, if the backward substitution discriminant success is > 80%, the FDA model is reliable. In this test, all training samples are taken as n new samples, which are then subjected to application of the discriminant function and criterion. The ratio of the number N of misjudgments to all samples is the misjudgment rate γ, which is calculated as \( \gamma =N/\sum \limits_{i=1}^k{n}_i \).

Discriminant model and its application to the prediction of environmental and ecological impacts of coal mining

Geological background

The Yushenfu mine area is located in the central Ordos Basin, between the Loess Plateau and the Maowusu Desert. It covers the three administrative districts of Yuyang district, Shenmu city, and Fugu county. It consists of two mine areas: Yushen and Shenfu (Fig. 1). The area is dominated by hilly sandy and loess landforms. The main strata exposed in the study area are the Upper Triassic Yongping Formation (T3y), Lower Jurassic Fuxian Formation (J1f), Middle Jurassic Yan’an (J2y), Zhiluo (J2z), and Anding formations (J2a), Lower Cretaceous Tuoluo Formation (K1l), Pliocene Sanzhima Formation (N2s), lower Pleistocene Sanmen Formation (Q1s), middle Pleistocene Lishi Formation (Q2l), upper Pleistocene Salawusu (Q3s) and Malan formations (Q3m), and Holocene aeolian sand (Q4eol) and alluvium (Q4al). The study area is located in the northern Shaanxi slope along the northeastern edge of the Ordos Basin. The structure is simple, and faults and folds are not developed at the surface in this region. The surface geology is dominated by Quaternary strata. The main coal-bearing strata in the mine area are in the Yan’an Formation (J2y), which is divided into five coal sections (from top to bottom: J2y5, J2y4, J2y3, J2y2, and J2y1). There are five layers of coal seams in each of the Shenfu and Yushen mine areas (1−2, 2−2, 3−1, 4−2, and 5−2, and 2−2, 3−1, 4−3, 5−2, and 5−3, respectively). The coal strata are better exposed and more shallowly buried in the eastern Shenfu mine area. The overburden thickness of the first coal seam 2−2 is < 100 m. Given the general westward dip of the strata, the burial depth in the Yushen mine area is greater, and the overburden thickness of the main coal seam 2−2 is 100–400 m. In most of the Shenfu mine area and east of the Yushen mine area, the mining conditions of the 2−2 coal seam are unusual, as the mined seam is shallowly buried beneath a thick and loose aquifer. The overburden lithology is generally a hard rock type. Especially when the structure type of overburden is a bumt rock type and a sand base type, the water diversion fissure zone is very developed and severely damages a highly water-rich aquifer. Therefore, destruction of the aquifer due to coal mining will have a serious impact on the water quality of the entire mine area, which will lead to serious ecological and environmental impacts (Song 2009; Dou 2011; Hou et al. 2016). As such, the Yushen and Shenfu mine areas are located in an environmentally sensitive region. Recent intensification of mining activity in this region is likely to lead to environmental impacts, such as destruction of surface vegetation, ground subsidence, degradation of water resources, atmospheric pollution, desertification, and intensification of soil erosion.

Fig. 1
figure 1

Location of the Yushenfu mine area in Shaanxi Province, China

Model input parameters

Numerous natural and engineering factors govern the environmental impacts of coal mining (Dou 2011, 2012). These can be divided into natural factors (phreatic water depth, water quality, rock and soil type, topography, water resources, thickness and depth ratio of coal seams, geomorphology, and climatic conditions) and social factors (transport, mining, mineral processing, and ore washing). Based on the regional characteristics of the Yushenfu mine area, consulting experts’ opinions and combining with the local actual situation, five factors affecting the environmental impacts in this area were selected for use in this study: (1) geomorphology (X1); (2) depth to the phreatic water layer (X2); (3) sand layer thickness of the phreatic water region (X3); (4) thickness of the overlying bedrock on the uppermost coal seam (referred to as the thickness of bedrock) (X4); and (5) thickness of the uppermost coal seam (X5). Among these inputs, geomorphology is a state parameter that must be described and quantified. The depth of the phreatic water layer, sand layer thickness of the phreatic water region, thickness of the bedrock, and thickness of the uppermost coal seam are measured values. The specific classification criteria and evaluation are listed in Table 1. The classification of the environmental impacts of coal mining is divided into groups I–V, which represent good, better, medium, poor, and extremely poor quality (Dou 2011, 2012).

Table 1 Classification and evaluation of the main factors affecting the environmental impacts of mining

Learning samples and establishment of the discriminant analysis model

Thirty sets of representative environmental impact/quality data were selected from the Yushenfu mine area, and 23 datasets were used as training samples for the FDA model (Table 2); 7 other datasets were used for discrimination testing (Dou 2011).

Table 2 Sample learning data for the discriminant analysis model

PCA data processing

The data in Table 2 were first standardized, and then the standardized data were processed by PCA. The correlation coefficient matrix for the indexes is provided in Table 3. There is a clear correlation between the five discriminant indexes that affect the mine environment, such as between the geomorphological type and sand layer thickness of the phreatic water (r = 0.734), and the information from the sample indexes overlaps. If the quality of the mine environment is discriminated using these five indexes, the prediction model will be affected and could lead to misjudgment. Therefore, it is necessary to perform PCA before input of the sample data. After PCA processing of the data, two principal components are extracted according to the distribution rules of the information content of each principal component in the PCA scree plot (Fig. 2). The selected principal components contain 83.93% of the information in the original data.

Table 3 Pearson correlation coefficient matrix for each index
Fig. 2
figure 2

PCA scree plot

According to the PCA matrix, the relationship between the new principal components Y1 and Y2 and the original variables is as follows:

$$ {Y}_1=0.54{X}_1+0.47X-0.49{X}_3-0.49{X}_4+0.11{X}_5 $$
(3)
$$ {Y}_2=-0.17{X}_1+0.47{X}_2+0.31{X}_3+0.14{X}_4+0.8{X}_5 $$
(4)

Fisher discriminant analysis

The environmental quality standards classified into five categories (I–V) were used as five different populations, and it was assumed that the covariance matrix of the environmental quality population of the five categories is equal. The two principal components Y1 and Y2 obtained by PCA were taken as the two discriminant indexes for the FDA model, and the Fisher discriminant function was obtained as follows:

$$ {Z}_1=1.023{Y}_1-0.025{Y}_2 $$
(5)
$$ {Z}_2=0.244{Y}_1+0.994{Y}_2 $$
(6)

The discriminability of the two discriminant functions is significant (Table 4). The first discriminant function corresponds to an eigenvalue of 27.081, and the variance contribution rate (discrimination efficiency) is 93.4%, which thus explains 93.4% of the sample information. The positive correlation coefficient is high (0.982), and this function can be used to discriminate most samples. When two discriminant functions are used, 100% of the sample information can be explained. Therefore, when Eq. (5) does not yield a clear judgment on sample classification, Eqs. (3) or (6) can be combined to obtain a clear judgment.

Table 4 Eigenvalues and group center values of the Fisher discriminant functions

To initially test the validity and accuracy of the PCA and FDA model, 23 sets of training sample data were subjected to the backward substitution method using the established model, and compared with the actual results (Table 2). Figure 3 is a schematic diagram of the groupings divided by the first- and second-class discriminant functions, which illustrates the powerful predictive function of the model. The overall performance of the two discriminant functions is clearly evident in the figure. The misjudgment rate in this test was zero (Table 2). As such, the model appears stable and reliable, and it accurately predicted and classified the 23 datasets into the quality groupings (good, better, medium, poor, and extremely poor). The classification resulted in small distances within groups and large distances between groups. Therefore, PCA and FDA are able to robustly model and efficiently predict and classify the environmental impacts of mining in this area.

Fig. 3
figure 3

Groupings of sample data using the first- and second-class discriminant functions

Model test and application

Another seven sets of sample data that were not used in the PCA and FDA model training were selected to further test the model. The calculated results are listed in Table 5. By comparison with a nonlinear model based on SVM (Dou 2011), it is evident that the discriminant results are exactly the same as the actual results (i.e., the classification accuracy rate is 100% and the misjudgment rate is zero). Therefore, this model is valid and reliable for making comprehensive predictions and classifications of environmental impact/quality in the mine areas. In addition, the method is relatively simple, stable, does not change with the number of sets of sample data, and has a high learning efficiency. Moreover, the method can not only effectively characterize the impacts of coal mining in an environmentally sensitive area with a small number of index variables, but also better eliminates the influence of superimposed information between the variables.

Table 5 Test results of the PCA and FDA predictive model

Conclusions

  1. 1.

    The impacts on environmental quality of coal mining in an environmentally sensitive area were evaluated by PCA and FDA. PCA was mainly used to reduce the dimensionality of the variables and extract the most useful information to facilitate the subsequent processing of the algorithm. FDA enables different samples to be distinguished. The combination of the two methods makes it possible to discriminate and analyze the samples.

  2. 2.

    This model was applied to the Yushenfu mine area and yielded excellent discrimination results. This indicates that the model is robust, simple, and feasible, given that it has a high prediction accuracy and misjudgment rate of zero. As such, this new method can robustly assess and classify environmental and ecological quality and the impacts of coal mining.

  3. 3.

    The PCA and FDA model is based on measured engineering and natural data, and the representativeness and accuracy of these data can affect the model outcome. The environment in the mine areas is complex and influenced by many factors. Therefore, for practical application of our model, extensive and appropriate sample datasets need to be collected and the model needs to be trained. The accuracy of the model will improve as more sample datasets are processed.