1 Introduction

In the coastal regions of China, shortage of available land resources occurs due to the continuous population growth and widely distributed hilly and mountainous landscapes, e.g., ~70.4 % of the lands in Zhejiang Province are low hills and mountains. However, the highly saline coastal soil causes adverse effects on the agricultural productivity. In order to improve the utilization and management of the reclaimed tidelands, it is first necessity to efficiently and accurately map the spatial variability of the soil salinity as well as other soil properties.

In an attempt to improve the efficiency of soil mapping, ancillary data have been employed to assist conventional soil mapping using geostatistical interpolation and inference algorithms, which is called digital soil mapping (DSM) (McBratney et al. 2003; Arrouays et al. 2014). Ancillary data include remote and proximal sensing data. In the case of the former, radar microwave sensing is an advantageous technique as it can operate in all kinds of weather conditions while other remote sensors (e.g. near-infrared spectroscopy) fail to work. Particularly, L-band microwaves can penetrate vegetation and soil to some extent (McColl et al. 2012; Kobayashi et al. 2012) and thus can be used to determine soil moisture content of the topsoil (Pellarin et al. 2003; Paloscia et al. 2012). With regard to the proximal sensors, electromagnetic (EM) induction meter has been widely used because it measures the soil apparent conductivity (ECa) to a deeper depth (e.g. 1.5 m for EM38) in a fast and noninvasive way. Over the past 30 years, EM induction data have been successfully used to map various soil properties, including salinity (Douaik et al. 2004; Guo et al. 2015; Li et al. 2015), clay content (Sudduth et al. 2003; Buchanan et al. 2012; Piikki et al. 2015) and moisture (Robinson et al. 2012).

In addition to ancillary data, it is crucial to determine a suitable sampling scheme when conducting a soil survey. Given the different patterns of soil spatial variability, a group of sampling approaches have been proposed, including simple random sampling (Brus et al. 2011; Wang et al. 2012; Webster and Lark 2013; Brus 2015), stratified sampling (Wallenius et al. 2011; Chen et al. 2015), grid sampling (Montanari et al. 2012; Barca et al. 2015; Huang et al. 2015a, b) and variance quad-tree method (Li et al. 2007; Yao et al. 2012). Response surface methodology (RSM) (Box and Wilson 1951) is one of the widely used sampling design methods in industry field, which spaces the sample locations apart to minimize the possibility of spatial autocorrelation and aims at reducing the cost of expensive analysis methods and their associated numerical noise. It has shown advantages in a number of optimization applications (Venter et al. 1996). With the advent of ancillary data in DSM, RSM or RSM-based software (e.g. ESAP) has been employed to assist soil sampling design (Amezketa and de Lersundi 2008; Lobell et al. 2010). It has been concluded that RSM is highly effective in estimating model parameters and ensuring unbiased prediction (Corwin and Lesch 2003; Eigenberg et al. 2008; Shanbedi et al. 2015).

RSM was originally developed to facilitate the estimation of soil salinity from apparent soil electrical conductivity (ECa) survey data (Lesch 2005). However, the underlying statistical methodology is quite general and directly applicable to the broader precision farming sampling. And Fitzgerald et al. (2006) also pointed out this method could provide the opportunity to input other types of geo-referenced survey data, such as remotely sensed imagery. On the other hand, except for a few cases combining remote sensing and proximal sensing in DSM (Triantafilis et al. 2009; De Benedetto et al. 2013; Guo et al. 2013; Priori et al. 2013; Huang et al. 2015a, b; Rodrigues et al. 2015), these two types of ancillary data have not been used widely to assist sampling design and mapping some important soil chemical properties [e.g. soil organic matter (SOM), available nitrogen (AN) and available potassium (AK)]. In this manuscript, we intended to evaluate the ability of RSM to direct ground sampling by substituting radar imagery for ECe in the ESAP software, producing predictive maps of soil attributes. Additionally, RSM-based sampling design was used associated with the ancillary data to map the spatial variability of three soil chemical properties (i.e. SOM, AN and AK), which has not been reported before.

2 Materials and methods

2.1 Study area

The study area is located on a 2.25 ha paddy field in a coastal saline area in the north of Shangyu City and in the south of the Hangzhou Gulf (Fig. 1a), Zhejiang Province, China. Over the past 50 years, approximately 17,000 ha of coastal land has been reclaimed around Shangyu City in successive programs (Fig. 1b). The soil is derived from recent marine and fluvial deposits. The reclaimed land has been mainly used for the production of cotton, cereals (e.g. wheat and rice) and horticultural crops (e.g. watermelons and grapes) while some of the land has been used for aqua culture (e.g. prawns and fish). The study area was reclaimed in 1996. The climate is subtropical with an average annual temperature of 16.5 °C, and an average annual precipitation of 1300 mm.

Fig. 1
figure 1

Location of the study field with reference to a the Hangzhou Gulf and b reclaimed lands over the past 40 years

2.2 Data collection, processing and harmonisation

Remotely sensed radar data were recorded by the Advanced Land Observing Satellite (ALOS) platform of Japanese Earth Observing Satellite Program. The ALOS satellite carries a Phased Array type L-band Synthetic Aperture Radar (PALSAR) active sensor with L-band frequency (1270 MHz) to achieve cloud-free and day-and-night land observation (JAXA EORC), which provides high resolution (i.e. 12.5 m) imagery data in single-polarized (HH) or dual-polarized mode. The imagery of the study area was acquired on 21 November 2010. We used the HH polarization mode data of level 1.5 PALSAR products, with a pixel spacing of 12.5 m, which are multi-look, processed on to map coordinates and are easily integrated with other georeferenced image data. Image rectification was carried out by ENVI 4.7 (ESRI Inc., 2012). The PALSAR image was rectified by the control points chosen from a registered 1:10,000 terrain map from Bureau of Surveying and Mapping of Zhejiang Province. Topographic effects were not considered because the study area is flat and located in the lowland plains.

The data were subsequently used to calculate the backscattering coefficient (σ0) using the following equation:

$$ \sigma^{0} = 10 \times \log_{10} \left( {\text{DN}} \right)^{2} +\; {\varepsilon}, $$

where ε is radar calibration coefficient; for ALOS/PALSAR Level 1.5 data, ε = −83.0 dB (Shimada et al. 2009) and DN is the grey value for radar image. During the study period, the soil was bare and σ0 can reflect soil moisture and was a function of soil salinity.

Proximally sensed data were collected on 21 November 2010. 768 measurements of apparent soil electrical conductivity (ECa, mS/m) were made at an approximate 5 m grid (Fig. 2a) with a Geonics EM38 and in the horizontal mode of operation. This mode of operation provides information about the rootzone (i.e. 0–0.75 m). Georeferencing was provided by a Trimble Global Positioning System with differential correction within 2 m.

Fig. 2
figure 2

Spatial distributions of a EM38 survey locations and soil sampling locations, b sampling locations generated by response surface methodology (RSM) with reference to the EM38 survey locations

The remotely sensed radar imagery and proximally sensed EM data were harmonized by extracting σ0 data according to the respective longitude and latitude of the 768 ECa locations. This was done using the nearest-neighbor algorithm available in ARCGIS 9.3 (ESRI Inc. 2012).

2.3 Soil sampling and laboratory analysis

Figure 2a shows the locations of the 42 soil samples collected at approximately 20 m grid spacing. Additionally, another 12 samples were determined using RSM (Fig. 2b). Details of the RSM-based sampling methods can be found in the next sub-section. All the samples were collected for the topsoil (i.e. 0–0.20 m) and then air-dried and sieved through a 2 mm-aperture sieve before analysis. Sample analysis was conducted according to the procedures described in Bao (2007). In brief, SOM was determined colorimetrically after H2SO4-dichromate oxidation at 150 °C. AN was measured by alkaline hydrolysis diffusion method. AK was measured by NH4OAC extraction method and analyzed using a flame photometer.

2.4 Directed sampling by response surface methodology (RSM)

In this study, an RSM-based software, ESAP (Lesch and Rhoades 2006) was used to select 12 sampling locations (Fig. 2b). The principle and applications of this method have been thoroughly described by Lesch et al. (2000), Lesch (2005) and Fitzgerald et al. (2006). In brief, RSM assumes that a linear relationship exists between the spatial ancillary dataset (e.g., ECa data) and the target dataset (e.g., soil salinity data).

Figure 3 shows the flowchart of the whole approach (Lesch 2005). In the first step, the acquired data (ECa and σ0) matrix (X) was transformed into a standardized (i.e. standardize each score to have 0 mean and unit variance) matrix X′ by principal components (PC) analysis with unusual readings (i.e. outliers) removed based on their standard deviation (SD). This was an iterative process that ended until no outliers had SD values more than 4 SD. In the next step, the traditional rotatable central composite response design (CCRSD) was imposed onto the transformed and decorrelated data for the third step to identify the initial candidate sites. Finally, the optimized sample sites were determined from the initial candidate sites using an iterative algorithm which maximized the covariance structure of the minimum separation distance between adjacent site locations. In terms of this step, optimization criterion (OptCri) value was employed to evaluate how uniform (i.e. evenly spread across the field) the selected sampling plan spreads. More specifically, for a sample of size n, the program calculated the approximate maximum possible separation distance (SDp) that a uniformly spaced sampling pattern might achieve. It then calculated the achieved average separation distance for the current design (SDa) and computed the OptCri score as SDp/SDa. In general, the uniform sampling plan has an OptCri value of 1.30 or less while highly non-uniform sampling plan or an unacceptable design typically has a value of 1.75 or more (Lesch et al. 2000).

Fig. 3
figure 3

Flowchart of response surface methodology (RSM)

3 Results and discussion

3.1 Spatial variability of soil moisture inverted from radar images

Figure 4a shows the distribution of σ0 extracted from ALOS/PALSAR radar imagery. Statistics analysis suggests that σ0 ranges from −10.80 to 0.75 dB with the mean of −7.44 dB. It can be seen from Fig. 4a that several clusters in the middle of the study area have extremely small σ0 values (i.e. σ0 < −10 dB). However, in the north margin of the field, relatively large σ0 values occur (i.e. σ0 > 0.75 dB). Intermediate σ0 values can be found between the small clusters and the northern margin of the field.

Fig. 4
figure 4

a Backscattering coefficient (σ0) derived from radar remotely sensed imagery, and b spatial distribution of soil moisture inversed by σ0

In terms of the relationship between soil moisture and σ0, previous studies have suggested that a simple regression model can be used to invert soil surface moisture using σ0 extracted from ALOS/PALSAR and measured soil volumetric water content (Pellarin et al. 2003). For bare soil, one of the main properties soil roughness can be treated as a constant. Using the positive linear regression model (Sonobe and Tani 2009), we inverted soil moisture (y) using the equation y = (σ0 + 19.78)/0.22 assuming soil roughness is homogenous across the study area. The distribution of soil moisture is shown in Fig. 4b. It is consistent with the distribution of σ0. Where σ0 has a large value (e.g. in the northern margin), soil moisture is high (e.g. >70 %) and where σ0 is low (e.g. in the central field), soil moisture is low (e.g. <20 %).

3.2 Spatial variability of soil salinity determined by EM38

Statistic analysis of the 768 ECa measurements shows that ECa has a mean of 114.02 mS/m with skewness of −0.7824 and kurtosis of −0.7641. Semivariance simulation of ECa resulted in an optimal exponential model (Fig. 5a) with a determination coefficient of 0.990. The model has a nugget (C0) of 630, sill (C + C0) of 5128 and range (A) of 247.80 m. The relatively large nugget for ECa data is most likely due to uneven distribution of soil salinity caused by ridge and furrow irrigation.

Fig. 5
figure 5

a Semivariogram of ECa with fitted exponential model, and b spatial distribution of ECa produced by ordinary kriging

Ordinary kriging of ECa with the exponential model using ARCGIS 9.3 (ESRI Inc., 2012) was conducted to illustrate the spatial distribution of soil salinity (Fig. 5b). Kriged ECa map shows an obvious spatial variation across the field, with large values ECa (i.e. >150 mS/m) in the right half of the field and small values (i.e. <125 mS/m) in the left. Very small ECa values (i.e. <100 mS/m) are identified in the margins of the whole study area. The distribution of ECa may result from the local topography and/or the drainage ditches near the field, long-term farming practice (e.g. ridge building in the surroundings, irrigation and drainage for the rice). Interestingly, the distribution of ECa shows somewhat similarity with the distribution of soil moisture (Fig. 4b), especially for the extreme values located in the centre and the northern margin.

3.3 RSM analysis of ECa and backscattering coefficient

Based on the spatial distribution of soil moisture and salinity delineated by radar imagery and EM38, respectively, RSM was used to determine an optimal set of soil sampling locations. In the first step of the RSM procedure, 8 outliners with SD more than 4 SD were removed (see Table 1). Figure 6 shows the initial candidate sites determined by CCRSD. By applying CCRSD, ECa and σ0 data which implied a second-order central composite sampling design was highly effective in minimizing the overall number of calibration sample sites. In the third step, two “candidate” survey sites were selected for each specific design level solely based on their statistical distance from the design level coordinates. Finally, a set of 12 sample sites were determined with the OptCri value of 0.85 and shown in Fig. 2b. Table 2 summarized CCRSD and optimized design levels of the 12 sampling locations. The optimization criterion of 0.85 indicates excellent uniformity for the sampling design. Besides, it is worth noting that 7 samples of the selected 12 points are located in the margins of the study area, where sharp changes of σ0 (Fig. 4a) and ECa (Fig. 5b) occur. It suggests that RSM-based sampling is highly depending on the characteristics of input dataset.

Table 1 Eight outliers removed (>4 SD) based on the standard deviations (SD) from the mean
Fig. 6
figure 6

Central composite response surface design CCRSD overlaid onto transformed ECa (PC1) and σ0 (PC2) by principal components (PC) analysis

Table 2 Summary results from the RSM sampling design

Without a priori information about the spatial variability of various soil properties, random or regular interval sampling should be applied to evaluate soil quality (Halvorson et al. 1997). In this study, the rapid, low-cost and easy-to-obtain ancillary data (i.e. radar imagery and EM38 data) provide a priori spatial information about soil moisture and salinity which are crucial to the soil quality in the coastal regions.

3.4 Characterizing soil spatial variation using RSM sampling strategy

Using RSM sampling strategy, the selected 12 soil samplings were used for characterizing the spatial variations of SOM, AN and AK in the study area. Meanwhile, the 42 grid soil samplings (Fig. 2a) were used as reference.

Table 3 shows some basic statistics analysis of SOM, AN and AK for the 42 grid soil samples and the 12 RSM-based samples. The mean values of topsoil SOM are almost the same (15.12 g/kg), and mean values of soil AN (50.42 and 50.61 mg/kg) and AK (120.21 and 126.87 mg/kg) are quite similar. Pearson correlation coefficients between soil properties and ancillary data (i.e. ECa and σ0) are also shown in Table 3. It is worth noting that similar correlation coefficients are identified for the two sampling strategies. For both sampling plans, ECa is significantly correlated with SOM and AN (P < 0.01).

Table 3 Descriptive statistics of soil organic matter (SOM, g/kg), available nitrogen (AN, mg/kg), and available potassium (AK, mg/kg) for 42 grid samplings and 12 RSM-generated samplings, and Pearson correlation coefficients between soil properties and ECa and backscattering coefficient (σ0)

Statistical differences between mean values of soil properties at each sampling design are evaluated by student’s t test with Tukey–Kramer means comparisons (Table 4). The values listed are the actual absolute differences in the means minus the least significant difference (i.e. abs-LSD). Two datasets with negative values are not significantly different. This suggests the RSM-based sampling can acquire similar spatial information of soil properties (i.e. SOM, AN and AK) compared with the high density grid sampling plan.

Table 4 Comparisons for the mean using the Tukey–Kramer HSD test for soil organic matter (SOM, g/kg), available nitrogen (AN, mg/kg), and potassium (AK, mg/kg) based on 42 grid samplings and 12 RSM-based samplings

In order to further understand the prediction efficiency of two sampling plans, we compare the predicted soil properties using digital soil mapping. Figure 7 shows the spatial distribution of SOM, AN and AK by inverse distance weighted method using ARCGIS 9.3 (ESRI Inc., 2012) based on the grid sampling (left) and the RSM sampling (right). It is clear that the maps present quite similar patterns of ‘high’ and ‘low’ values for each soil property between two sampling strategies. Strong correlations are found between the raster maps interpolated from the RSM-based sampling and the grid soil sampling, with spatial coefficients of 0.83, 0.87 and 0.76 for SOM, AN, and AK, respectively.

Fig. 7
figure 7

Inverse distance weighted interpolated map of a soil organic matter (SOM), b available nitrogen (AN), and c available potassium (AK) based on 42 grid samplings (left) and 12 RSM-generated samplings (right)

One of the aims of precision agriculture is to determine spatial variability of soil for precise fertilization. Herein, the prediction precision and bias were calculated in ARCGIS 9.3 (ESRI Inc., 2012) using the cross-validation tool (see Table 5). In terms of prediction precision, grid sampling produced a slightly larger RMSE for SOM (1.87 g/kg) and AN (7.33 mg/kg) compared with RSM-based sampling (1.33 g/kg and 6.46 mg/kg), respectively. However, the prediction precision of AK was slightly smaller for grid sampling (43.66 mg/kg) than RSM-based sampling (47.83 mg/kg). In terms of prediction bias, grid sampling performs better than RSM-based sampling for the three soil properties.

Table 5 Comparisons of prediction precision and spatial heterogeneity for soil organic matter (SOM, g/kg), available nitrogen (AN, mg/kg), and potassium (AK, mg/kg) using grid sampling and RSM-based sampling

In geostatistical terms, nugget value (C0) indicates spatial heterogeneity induced by random factor, such as experimental error. Sill variance (C + C0) comprises any nugget variance and the spatially correlated variance (C). The finite distance at which some variograms reach their sill is the range (a), i.e. the range of spatial dependence. The parameters of the semivariograms for the soil variables can be also found in Table 5. Compared with grid-based sampling, smaller C0 (0.001, 0.10 and 19.00 for SOM, AN and AK, respectively) and range (109.29, 109.12 and 50.06 for SOM, AN and AK, respectively) values for RSM indicate the spatial heterogeneity of these soil maps produced by RSM is weaker. In addition, the soils maps produced by RSM still show autocorrelation given the C0 to C values are relatively small (0.006, 0.002 and 0.026, for SOM, AN and AK, respectively). Given these results, we consider the RSM-based method as an acceptable method given the smaller sampling size.

4 Conclusions

In a reclaimed coastal tideland field near the Hangzhou Gulf, spatial variability of soil properties was studied using response surface methodology (RSM) sampling, with remotely sensed radar imagery and proximally sensed EM38 data. Radar imagery and EM38 data have been used to indicate the spatial distribution of topsoil moisture and salinity. Based on the correlations between the soil properties and ancillary data, RSM was employed to determine an optimal set of 12 soil samples, which were further used to delineate the spatial distribution of SOM, AN and AK using inverse distance weighted interpolated method. The maps produced by RSM-based sampling achieved similar results to the prediction generated by a conventional grid sampling using 42 samples.

Soil moisture and salinity are two key factors that affect the soil quality and crop choice in the reclaimed land and it is of great importance to characterize their spatial variability. As the acquisition of remotely sensed optical image is often hampered by heavy cloud cover and adverse weather in subtropical coastal zones of China, the use of radar imagery will become promising to monitor soil moisture. In combination of EM38 survey, soil salinity information can be added to the database rapidly and cost-effectively and used to study the relationship between soil moisture and salinity of the reclaimed saline tidelands.

RSM-based soil sampling has shown improved efficiency compared with conventional grid sampling, which are commonly used to minimize the estimation variance of linear statistical models in the non-spatial setting and can produce continuously variable maps of the ground factor of interest. On the other hand, this approach lends itself naturally to the analysis of proximal sensor data. Indeed, many types of ground- airborne- and satellite-based remotely sensed data are often collected specifically because one expects them to correlate strongly with some property of interest (e.g. soil type, soil salinity, etc.) (Lesch et al. 1995a, b, 2000; Lesch 2005). In this study, RSM-based soil sampling improved efficiency compared with grid sampling, firstly using a combination of remote sensed radar imagery and proximally sensed EM38 data. In the field application, financial budget and target resolution should be taken into consideration when determining the number of samples. However, it was noticed that RSM tended to be “attracted” to some points with more extreme values, such as bare soil (skips and missing plants) as well as field edges. Johnson et al. (2005) also reported this tendency of RSM to choose extreme ECa values. In view of this, the process of determination of the optimized sampling points needs to be improved and optimized as some of the sampling sites are located close to the field margins, which often increases uncertainty. Besides, it is worth understanding if RSM-based sampling can be applied to a large scale for natural resources management.