Introduction

As one of the most important mainstay industries in the energy market, opencast coal mines are primarily located in Northwestern China in vulnerable environments, such as Shanxi Province, Inner Mongolia, and Shaanxi Province (Wang et al. 2013). The surface soil and vegetation in these areas had been destroyed by opencast mining, leading to the destruction of the local ecological environment and the loss of natural conditions (Jacinthe and Lal 2006; Wang et al. 2014). To achieve optimal productivity of the mine soils, the most fundamental and important step is to construct optimal physical, chemical, and biological soil conditions (Nikolic and Nikolic 2012; Papadopoulou-Vrynioti et al. 2014). Therefore, it is important to understand reconstructed soil properties and their spatial distribution to select the appropriate land reclamation measures. By contrast, designing site-specific management strategies for precision reclamation, or applying simulation modeling, also requires soil data at a much finer scale, which are obtained primarily by on-site detailed sampling across the mapping units and land use or management practices (Gaston et al. 2001; Kariuki et al. 2009). The soil monitoring network plays an important role in detecting the spatial distribution of soil properties and in making sustainable land reclamation decisions (Liu et al. 2014). Therefore, it is necessary to construct a monitoring network of reconstructed soil properties and to track the changes in soil properties over reclamation time.

Opencast coal mining is an anthropogenic activity that changes the antecedent soil profile and the physical, chemical, and biological soil properties (Shukla et al. 2004a). Soil disturbance caused by mining leads to the loss of aggregation, the destruction of soil structure, increased bulk density, reduced porosity, and subsoil contamination (Shukla et al. 2004b). To interpret and correctly predict the effects of mining activities in a mining area, an understanding of the level of degradation and the spatial distribution of the area is essential. Geostatistics is a useful tool for analyzing spatial variability, interpolating between point observations, and ascertaining the interpolated values with a specified error using a minimum number of observations (Burrough et al. 1997; Burrough 2001).

Reconstructed soils are generally heterogeneous, and this heterogeneity stems from partial mixing and irregular spreading of topsoil materials (Jacinthe and Lal 2006). Several attempts have been made to identify the variability of soil properties using a geostatistics method for agricultural soil (Hu et al. 2008; Wang et al. 2010; Barik et al. 2014; Hu et al. 2014), and only a few accounts exist for reconstructed soils in mining areas (Akala and Lal 2001; Shukla et al. 2004b; Shukla et al. 2007; Nyamadzawo et al. 2008). The previous studies mainly analyzed the spatial variability of the fertility index (including total organic carbon and total nitrogen) for reconstructed soils after land reclamation and the effect of land reclamation on soil properties (Schroeder 1995; Bendfeldt et al. 2001; Shukla and Lal 2005; Nyamadzawo et al. 2008). However, the effects of mining activities on reconstructed soil properties and spatial variability caused by mining were not thoroughly investigated, and some physical properties (including soil particle distribution and soil penetration resistance) were not evaluated.

For reconstructed soils, this variability could be further amplified due to the random distribution of soil properties introduced by mining activities. Heavy machinery is commonly used during dumping, and consequently, these soils tend to be compacted. Both the volume and geometry of soil macropores are negatively affected by compaction (Schjonning and Rasmussen 2000). Thus, the objectives of this study were to (i) assess the spatial variability of soil properties (including soil particle distribution and soil penetration resistance) on reconstructed soils after dumping and before reclamation using a geostatistics method and to analyze the effects of opencast mining activities on soil properties and (ii) to determine a monitoring network of soil properties based on a geostatistical method to track the changes in soil properties after land reclamation.

Study site

The study area was an opencast coal mine in Shanxi Pingshuo, which is the largest opencast coal mining area in China and includes the Antaibao, Anjialing, and East opencast mines. The Pingshuo opencast coal mine is located along the border of Shanxi Province, Shaanxi, and Inner Mongolia of the east Loess Plateau with the geographic coordinates 112° 17′ 28″∼112° 28′10″ E, 39° 25′ 6″∼39° 36′5″ N, as shown in Fig. 1.

Fig. 1
figure 1

Schematic diagram of the geographical location

This mining area has a typical temperate arid to semi-arid continental monsoon climate and a fragile ecological environment. The altitude of the original landform is 1300–1500 m, and the terrain is loess hills with grass vegetation. The average annual rainfall is approximately 450 mm, with 65% falling from June to September. The average annual evaporation, however, is approximately 2160 mm, 4.6 times more than the rainfall. This region was once primarily a landscape of forest and prairie; however, during the last 200 years, the primary vegetation has been damaged and has led to chronic erosion problems. Its chestnut soils are characterized by low levels of organic matter and poor structure. The extensive mining activities have caused the fragile eco-environmental situation to worsen in this area. The original landform, geological strata, and ecosystem no longer exist.

The soils of the original landform in this mining area consist of thick topsoil and low soil fertility. Opencast mining activities, such as excavation, transport, and dumping, have significantly disturbed the soils, and the soil profile has been greatly changed. The land reclamation in the Pingshuo opencast coal mine began 20 years ago (Bi et al. 2010). The top 0–100-cm soils from the opencast mining operations were removed, preserved from wind and water erosion, and stored separately. After dumping, the 100-cm-thick topsoil was used to cover the surface of the dump, and some of the reclaimed area has been used for agriculture (Li et al. 2012). The specific study area is located in the inner dump of the Antaibao mine with an area of 0.44 km2. The study site was on the top platform of the inner dump at an altitude of 1474–1480 m. It was dumped in 2012, and no vegetation was planted.

Methodology

Soil sampling and analysis

In June 2013, equally spaced soil samples at 78 sampling sites were collected using an auger at the depths of 0–20, 20–40, 40–60, and 60–80 cm (Fig. 2). The sampling sites were randomly arranged within a distance of 60–80 m. All soil samples were air-dried, and the clods were broken using steel rolling pins in order for the soil to pass through a 2-mm mesh. Soil particle distribution and soil compaction at the depths of 0–20 and 20–40 cm, soil pH, and total dissolved salt (TDS) at all soil depths were measured. Soil particles of soil samples were analyzed using a laser particle size analyzer Longbench Mastersizer 2000 (Malvern Instruments, Malvern, England). Soil pH was determined with a potentiometer using 1:5 water extracts. Soil electrical conductivity (EC) was measured with a soil TDS meter TDS11 (Lovibond, Germany), and total dissolved salt (TDS) was determined by the empirical equation developed by Marion and Babcock (1976). Soil penetration resistance (PR) was determined using a penetrometer TJSD-750-II (Top Instruments, Hangzhou, Zhejiang, China).

Fig. 2
figure 2

Layout of the soil sampling points in the study area

Statistical analysis

Descriptive statistics, including the mean, median, standard deviation, coefficient of variation (CV), maximum, minimum, and the Kolmogorov-Smirnov (K-S) test were obtained for each measured soil variable using SPSS 19.0. Geostatistical methods were used to study the spatial variability of the reconstructed soil properties (Goovaerts 1998; Akramkhanov et al. 2014). Geostatistics is based on the theory of a regionalized variable, which is distributed in space with spatial coordinates and shows spatial autocorrelation such that samples close together in space are more alike than those that are further apart (Black et al. 2014).

The geostatistics approach consists of the following two parts: the first is the calculation of an experimental variogram from the data and model fitting, and the second is the estimation at unsampled locations (Jang et al. 2013; Jamshidi et al. 2014). The semivariogram of each soil property was constructed using the following model:

$$ \gamma (h)=\frac{1}{2N(h)}\sum_{i=1}^{N(h)}{\left[Z\left({x}_i\right)-Z\left({x}_i+h\right)\right]}^2 $$
(1)

where γ (h) is the semivariance for the internal distance class h, h is the lag interval, and N (h) is the total number of sample pairs for the lag interval h. Z(x i ) is the measured sample value at point i, and Z(x i  + h) is the measured sample value at point i + h. Based upon the minimization of the sum of the squared deviations between the experimental and theoretical semivariograms, the spherical model, exponential model, and Gaussian model were selected to further investigate the spatial structure.

The fitted model provides information about the spatial structure, as well as the input parameters for kriging interpolation. Among the several estimation methods, kriging is the most popular because it is a collection of generalized linear regression techniques for minimizing and estimating the variance defined from a prior model for a covariance (Candela et al. 1988). Kriging is not only used to estimate unsampled areas; it is also used to build probabilistic models of uncertainty about unknown, but estimated, predicted values (Machuca-Mory and Deutsch 2013). The kriging estimates can also be mapped to reveal the overall trend of data (Goodchild et al. 2009).

The ratio of the nugget to the total sill value (NSR) was used to define distinct classes of spatial dependence for the soil variables. If the ratio was ˂25%, the variable was considered strongly spatially dependent. If the ratio was between 25 and 75%, the variable was considered moderately spatially dependent, and if the ratio was >75%, the variable was considered weakly spatially dependent (Cambardella et al. 1994).

Similar to conventional statistics, a normal distribution for the variable under study is desirable in linear geostatistics (Clark and Allingham 2011). The ArcGIS 10 software was used for modeling the semivariogram and producing the contour map using kriging techniques. The best-fit model with the lowest value of root mean square was selected for each soil property. Three variogram models, i.e., spherical, exponential, and Gaussian models, were fitted, and nugget, sill, and range were estimated to provide information about the spatial variation of each soil property.

Determination of the sampling number for soil monitoring

Determination of the sampling number using the conventional statistics method

The Technical Specification for Soil Environmental Monitoring of China (HJ/T 166-2004) provides a method to determine the sampling number for monitoring soil properties based on conventional statistics, and it was selected for comparison with the geostatistics method in this study. The spatial relationship of the sampling points is not considered when calculating the sampling number using the conventional statistics method. The calculation formula of the sampling number provided by The Technical Specification for Soil Environmental Monitoring of China is as follows:

$$ N={t}^2{C_{\mathrm{V}}}^2/{m}^2 $$
(2)

where N is the sampling number and t is a value under a certain freedom for a selected confidence level (95% is generally selected for soil environmental monitoring), and it can be determined according to Appendix A in Technical Specification for Soil Environmental Monitoring of China, C V is the coefficient of variation in %, and m is a relatively acceptable deviation in %, which is generally limited to 20–30% for soil environmental monitoring.

Determination of the sampling number using the geostatistics method

Cross-validation is performed to determine the optimal sampling number (Aute et al. 2013). Split sampling is performed by omitting some data values during model calibration for later use to test the fitted model. If the data points are rather limited, the validation process uses the “fractious point” method. Cross-validation involves removing one data point at a time. The value from the omitted point is fitted using an adopted kriging model based on the remaining n − 1 points, and the estimated value is compared to the one observed for this point. The root-mean-square error (RMSE) is used to measure the accuracy of the kriging method using the following model (Chaouai and Fytas 1991):

$$ RMSE=\sqrt{\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$n$}\right.\sum_{i=1}^n{\left[Y\left({x}_1\right)-{Y}^{\ast}\left({x}_i\right)\right]}^2} $$
(3)

where Y(x 1) is the measured value, Y (x i ) is the predicted value, and n is the sampling number. The kriging interpolation is hypothesized to be the most accurate when the RMSE is at a minimum and is stable, and the sampling number is the most rational at this time.

In this study, 10, 20, 30, 40, 50, 60, and 70 sampling points were randomly selected from a total of 78 sampling points to perform the kriging interpolation. By analyzing the prediction accuracy of the reconstructed soil properties under different sampling points in the study area, the optimal sampling number for reconstructed soil monitoring was determined according to the minimum prediction error.

Results

Variability of soil properties

Descriptive statistics, including mean, standard deviation, median, CV, and minimum and maximum for the values of soil particle distribution, PR, TDS, and pH in the reconstructed soils are presented in Table 1. The mean and median values were used as the primary estimates of the central tendency, and the standard deviation, CV, minimum, and maximum were used as the estimates of variability from each site. Normality tests were conducted using the significance level of the K-S test, and all the values of soil particle distribution, PR, and pH passed the normality test (p > 0.05), except for TDS.

Table 1 Descriptive statistics for the soil properties

The mean and the median values were mostly similar, with the majority of median values either equal to or smaller than the mean values for all soil properties. This indicated that the outliers did not dominate the measures of central tendency. Similar means and medians for several soil physical, chemical, and biological properties and for grain and biomass yields were also reported in other studies (Cambardella et al. 1994; Shukla and Lal 2005; Nyamadzawo et al. 2008). Soil particle distribution had moderate CV, except for silt content at the depth of 0–20 cm with a low CV and sand content at the depth of 20–40 cm with a high CV (>35%). The pH showed a low CV (<15%), and TDS had moderate CV (15–35%) at all depths. The CV for PR was high at the depth of 0–20 cm and moderate at the depth of 20–40 cm. A lower CV for pH has been reported in several other reports (Shukla et al. 2004b). The high CV for soil compaction properties has also been documented by other investigators (Ussiri et al. 2006; Barik et al. 2014). Overall, the descriptive statistics showed low soil variability in the study area.

Spatial variability of soil properties

Soil properties may vary due to intrinsic or extrinsic sources of variability. Descriptive statistics cannot discriminate between these two sources of variability. Therefore, the examination of the spatial correlation structure of each soil property was further explored. The experimental site also displayed differences in its spatial dependence as determined by its semivariograms (Table 2). The semivariance ideally increases with the distance between a sample location or lag distance to a more or less constant value (the total sill). The distance that the semivariance attains after a constant value is known as the range of spatial dependence (Cambardella et al. 1994). Samples separated by a distance closer than the range are spatially correlated, and those separated by a distance greater than the range are independent. The experimental semivariograms for all measured soil properties for all depths exhibited spatial structure. The semivariogram models and best-fitted model parameters are provided in Table 2. The semivariograms of the clay content at all depths, the silt content at a 0–20-cm depth, the sand content at a 0–20-cm depth, the PR at a 20–40-cm depth, and the TDS at a 0–20-cm depth were fitted to a spherical model with the nugget effect. The semivariograms of the silt content at a 20–40-cm depth, the sand content at a 20–40-cm depth, the pH except at a 40–60-cm depth, and the TDS at a 40–60-cm depth were fitted to exponential model with the nugget effects, and the semivariograms of the PR at a 0–20-cm depth, the pH at a 40–60-cm depth, and the TDS at a 20–40-cm depth were fitted to Gaussian model with the nugget effects. The existence of a positive nugget effect in some of the variables can be explained by sampling error, short-range variability, and unexplained variability (Burgos et al. 2006). The nugget semivariance was generally low for the pH and TDS but high for the soil particle and PR for all depths. A higher nugget value tends to mask the spatial variability of the attributes. No definite trend for the nugget variance was obtained with increasing depth. All semivariograms are generally well structured with a small nugget effect, indicating that the sampling density is adequate to reveal the spatial structures (McGrath et al. 2004).

Table 2 The parameters of the variogram models for the soil properties

The nugget variance expressed as a percentage of the total semivariance enabled a comparison of the relative size of the nugget effect among the soil properties (Trangmar et al. 1985). The NSR is useful for defining the spatial dependence of those attributes for which the range values are similar. Using NSR, the semivariograms indicated moderate spatial dependency for most of the parameters in the study area (Table 2). The strong spatial dependency (NSR <25%) was obtained for pH concentration at the depths of 20–40 and 60–80 cm and TDS at the depths of 20–40 and 60–80 cm. However, the range values were only similar at a 40–60-cm depth for TDS and pH.

Spatial distribution of soil properties

The results of the spatial dependence enabled the presentation of kriged maps of the different variables. Figures 3, 4, 5, and 6 show the contour maps obtained by simple kriging for the soil properties. Maps for each variable were maintained on the same scales and with the same contour intervals to allow for easier comparison. In general, the maps showed high variability, which was also obtained in the results of the statistical methods (Table 1). These maps help understand the variability of the soil properties in the study site by providing a visual representation and greater spatial detail.

Fig. 3
figure 3

Spatial distribution (contour map) of the soil particles (a 0–20 cm sand content, b 20–40 cm sand content, c 0–20 cm silt content, d 20–40 cm silt content, e 0–20 cm clay content, and f 20–40 cm clay content; unit: % )

Fig. 4
figure 4

Spatial distribution (contour map) of the soil penetration resistance (a 0–20 and b 20–40 cm; unit: kPa)

Fig. 5
figure 5

Spatial distribution (contour map) of the pH (a 0–20, b 20–40, c 40–60, and d 60–80 cm)

Fig. 6
figure 6

Spatial distribution (contour map) of the TDS (a 0–20, b 20–40, c 40–60, and d 60–80 cm; unit: %)

The spatial variability of soil particle distribution at the depth of 0–20 cm was higher than that at the depth of 20–40 cm. The clay content distribution had no obvious trend at the depth of 0–20 cm, and the areas with high clay content were distributed in the northern, central, and southern and were shaped as strips. The spatial variability of clay content was very low at the depth of 20–40 cm, and it had relatively uniform content within the study area. The distribution of silt content and sand content had a good complementary. The silt content was high in northeast and central and low in northwest and southwest at the depth of 0–20 cm and showed a northwest-southeast direction distribution. The sand content distribution was the opposite; it was low in the northeast and central and high in northwest and southwest and had a northwest-southeast direction distribution. At the depth of 20–40 cm, the silt content was high in the area from northwest to southeast and low in the other areas, and the sand content was opposite the sand content.

The PR was highest in western at the depth of 0–20 cm, followed in central, northern, and southern, and it was relatively low in the other areas. With the increase of soil depth, the variability had a slight increase; at the depth of 20–40 cm, the area with high PR increased, but the distribution pattern where the PR was high in northern, central, and southern was unchanged. The results where the variability of PR at the depth of 0–20 cm was less than that at the depth of 20–40 cm was opposite to the other findings in agricultural land (Junior et al. 2006). A significant horizontal spatial distribution in the PR suggested that compaction effects by heavy machinery operations were not equally the same all over the field, which may be due to the effects of dumping technology and variations in the other soil properties (Barik et al. 2014).

In the study area, the reconstructed soils were alkaline. The change trend in pH was not obvious at the depth of 0–80 cm, and the variation and distribution direction were not different at the four soil depths. The heterogeneity of soil pH was lowest at the depth of 40–60 cm, and the range of soil pH was between 8.05 and 8.82 with a decreasing trend from southeast to northwest. The soil pH slightly decreased after disturbance in the study area, but it maintained a consistent level with the original topography.

The soil was non-salinized in the study area, and only the TDS values at the depths of 0–20 cm in northeast and 40–60 and 60–80 cm in northern were near 0.01% and belonged to salinized soil. With the increase of soil depth, TDS had no obvious change, and the TDS at the depth of 40–60 cm was higher than that at the other three soil depths.

Discussion

Mechanism of spatial variability of reconstructed soil properties

The spatial variability of a regionalized variable under study is given by its sill values. The CV of the soil properties was high, and there was no clear trend with increasing soil depth. The large CV indicated the heterogeneity of the reconstructed soils (Nyamadzawo et al. 2008; Kamberis et al. 2012). The heterogeneity in the reconstructed soils after dumping and before reclamation may be a result of mining activities. If the spatial distribution of soil properties is consistent, then the spatial variability is caused by structural factors (Zhang et al. 2010; Papadopoulou-Vrynioti et al. 2013). In this study area, the distribution of soil properties had no similarity, and this indicated that the spatial variability mainly arose from random factors rather than structural factors. For a large-scale coal mine, the disturbance by large mechanical recycling operations and humans has far-reaching impacts on soil development. The excavation, transport, and dumping activities significantly disrupted the soils, which created the spatial variability of the soil properties. Severe soil compaction and vegetation destruction also led to the loss of soil aggregates and soil structure (Shukla et al. 2004a; Jacinthe and Lal 2006; Rozos et al. 2013). Therefore, it is important to reclaim the land and restore vegetation to disturbed soils in mining areas.

The optimal rational sampling number for soil monitoring

In 2011, the Land Reclamation Regulation was promulgated in China. Article 31 of the Regulation claims that soil properties should be monitored and assessed to propose some suggestions and measures for soil improvement when the destroyed land in a mining area is reclaimed for agricultural land. The study site had no plants and would soon be reclaimed, and thus, the changes in the soil particle distribution, PR, pH, and TDS after reclamation should be monitored (Shukla et al. 2004a). Therefore, the number of monitoring points should be determined, and the monitoring points should be planned (Rozane et al. 2011).

A total of 10, 20, 30, 40, 50, 60, 70, and 78 sample points were randomly selected from 78 sample points for cross-validation, and the RMSEs of the different soil properties at different depths were calculated. For accurate results, the selection process was repeated three times (i.e., three treatments) (Rodeghiero and Cescatti 2008). The RMSEs of the soil properties are shown in Table 3. The RMSE values of the clay content, silt content, sand content, and PR decreased with increasing sample sites at the depths of 0–20 and 20–40 cm and were stable at 40, 50, 40, and 50 sample points for the 0–20-cm depth and 40, 40, 40, and 50 sample points for the 20–40-cm depths. The RMSE values of the pH had a similar trend at the four depths and were stable at 50 sample points for all depths. The RMSE values of the TDS showed a large fluctuation and then stabilized when the number of sample points was 50 at the depths of 0–20 and 20–40 cm, and it decreased with increasing sample points at the depths of 40–60 and 60–80 cm but stabilized at 50 and 70 sample points, respectively.

Table 3 The RMSEs of the different soil property parameters under different sampling numbers

The rational sampling number calculated by the conventional statistical methods was less than 20 (Table 4) and was between 30 and 50 based on the geostatistical method. To not only meet the accuracy requirements of the regulated standards but also to consider the spatial relationship between the sampling points, the rational sampling number was determined as 40 in the study area. After the determination of sampling number, an optimal sampling scheme can be designed. Some optimal environmental monitoring networks have been designed using the MSANOS software (Barca et al. 2015), a multifactor map (Youssef et al. 2015; Bathrellos et al. 2017), and the variance quadtree algorithm method (Minasny et al. 2007), and these methods were good references for reclaimed soil monitoring in mining areas. Therefore, the next research priority should be focused on the sampling scheme for reclaimed land monitoring.

Table 4 The optimal sampling number for soil properties monitoring based on the conventional statistics method

In local studies, the application of the procedure is able to present direct spatial variability of reconstructed soil properties and determine the optimal soil sampling number for reclaimed land monitoring. Therefore, the monitoring scheme of the soil properties of reclaimed land in these areas is easily designed during the early planning stages. The optimal soil sampling number of an area can be logically estimated. Therefore, engineers, planners, decision makers, and environmental managers can utilize the proposed procedure in new and existing mined land reclamation projects. Additionally, the proposed methodology may be used by the local authorities to guide the adoption of policies and strategies aiming towards mined land reclamation monitoring.

Conclusions

The following conclusions can be drawn from our findings.

  1. 1.

    There was moderate spatial variability of the soil properties in reconstructed soils after dumping and before reclamation according to statistical and geostatistical analyses.

  2. 2.

    A geostatistical analysis was useful for estimating the soil properties and interpreting the spatial variability, and considerable heterogeneity of these variables was observed from the contour maps.

  3. 3.

    Mining activities significantly disrupted the mine soils, and the spatial structure of the original landform was partially or completely destroyed. Land reclamation is an important measure in developing soil properties.

  4. 4.

    Based on traditional statistical and geostatistical methods, further monitoring of the soil properties is necessary to evaluate the effects of reclamation over time. Cross-validation can be used to test the accuracy of the geostatistics, and RMSE can be used to measure the accuracy of the kriging method and determine the optimal number of sampling points in the monitoring of soil properties.