Introduction

Many soil properties (biological, chemical and physical) can be seen as spatiotemporal processes and modelling their spatiotemporal distributions is essential in many environmental sciences. Ecologists are often interested in the environmental factors that regulate plant and soil microbe community diversity across temporal and spatial scales, the impact of human activity on this diversity and the consequences of this diversity for ecosystem processes (Cao et al. 2010). Analyses of spatial variation on plant diversity and soil properties responding to soil contamination together with environmental factors may be useful as a tool for developing a sustainable soil management and soil pollution control (Andreasen and Skovgaard 2009; Gao et al. 2010a). Data sets in ecology and evolution often fall outside the scope of the methods taught in introductory statistics classes (Bolker et al. 2009). Where basic statistics rely on normally distributed data, ecology and evolution data are often binary (e.g. presence or absence of a species in a site). Ecologists have more recently begun to appreciate the importance of random variation in space and time or among individuals.

Geostatistics offer a variety of methods to model such processes as realizations of random functions and there are examples of such geostatistical applications in studies on atmospheric pollution (Rouhani et al. 1992; Vyas and Christakos 1997), earth geophysics (Handcock and Wallis 1994; Bogaert and Christakos 1997), soil moisture content (Goovaerts and Sonnet 1993; Heuvelink et al. 1997; Cao 2008), rainfall or piezometric head fields (Rouhani and Wackernagel 1990; Armstrong et al. 1993), risk for exposure to pollutants (Carroll et al. 1997; Christakos and Hristopulos 1998), soil impedance (Castrignanò et al. 2002) and ecology (Hohn et al. 1993). An exhaustive review of geostatistical space-time models was given by Kyriakidis and Journel (1999). Moreover, geostatistics offer a lot of methods for modelling spatial uncertainty and risk assessment for both continuous and categorical attributes (Goovaerts 1997). Often geostatistics are combined with principal components analysis (Goovaerts 1997; Wackernagel 2003; Webster and Oliver 2007) and the same could be possible for canonical correspondence analysis (CCA) and redundancy analysis (RDA) (Webster and Oliver 1990).

How to integrate the time and spatial data on soil environment and gain more useful information for effectively assessing soil health effect and realizing optimal management strategy is still an unsolved problem. Gao et al. (2010b) have done the spatial characteristic research on soil enzyme activities and microbial community structure under different land uses in Chongming Island, China. Based on the spatial data from Chongming Island and the multivariate analysis method of ecological data provided by Lepš and Šmilauer (2003), the aim of this paper is to put forward an approach of spatial biostatistics and geostatistics combined with geographic information system (GIS) for better assessing soil health and guidance for land use and ecological management.

Research methods

Data collect and processing

This paper takes the spatial distribution of heavy metals and soil enzyme activities in Chongming Island as an example. The Chongming Island is divided into four major functional districts, including industrial use, agricultural use, commercial use (involving residential area) and wetland (Fig. 1). The detail on the description of each land use and main soil characteristic parameters can be seen in Gao et al. (2010b, c). A total of 294 sample points were collected and located by GPS, and mixed soil samples (about 1 kg soil wet weight for each sampling point) within a radius of 100 m were then collected from each sampling point in sterile plastic bags. After biological, chemical and physical analysis for those samples, the database on the soil properties in Chongming Island was set.

Fig.1
figure 1

Land use of Chongming Island, China (121°09′30″–121°54′00″E, 31°27′00″–31°51′15″N)

Spatial data on Chongming Island were collected according to a stratification scheme, using elevation (300 m bands), aspect (four classes, centered on N, S, E and W), slope, soil layer (0–20, 20–40, 40–60 cm) and five geologic substrates. Spatial data on Chongming Island was mapped by ArcGis9.0 Software. CCA can be a useful tool to show relationships between microbe communities and soil enzyme activity in response to soil contamination and environmental factors. RDA is used to analyze sampling time as covariables and the interactions between the treatment levels and sampling times represent environmental variables. Moreover, geostatistics are extensively used to assess the level of soil contamination and calculate the risk in contaminated sites, by preserving the spatial distribution and uncertainty of the estimates. It facilitates quantification of the spatial features of soil parameters and enables spatial interpolation. CCA and RDA-based models were calibrated in CANOCO software.

The process of spatially relating data on Chongming Island is concluded in Fig. 2. First, the data on environmental variable (e.g. microorganism community, enzyme activity, soil gene polymorphisms, soil physical and chemical properties, plant diversity, climate, land use and its corresponding geographical coordinate and sample time) are collected. Secondly, predictor and response variables are set, usually as three types of variables including species variable, environmental variable and covariable. Species variable is the evaluated parameter caused by environmental factor, and environmental variable includes the content of heavy metals and organic pollutants in soil. Covariable is binary (i.e. presence/absence), and its related parameters include fertilization, sample time, climate, soil layer depth, different management measure and principle component, and so on.

Fig.2
figure 2

Flow charts of assessment of soil health

Canonical correspondence analysis

Canonical correspondence analysis (CCA) is used to predict the distributions of plant species or microorganism communities, principle pollutants, environmental variables and its interaction. In this direct gradient analysis technique, main axes of a correspondence analysis are constrained to be a linear combination of environmental descriptors, which makes it very comparable to linear regression. The ordination diagram of CCA displays species, environmental variable and covariable (Fig. 3). CCA is a multivariate extension of weighted averaging ordination, which is a simple method for arranging species along environmental variables (Andreasen and Skovgaard 2009). CCA constructs those linear combinations of environmental variables, along which the distributions of the species are maximally separated (Ter-Braak 1986; Guisan et al. 1999). The eigenvalues produced by CCA measure this separation. CCA is a correspondence analysis technique in which the ordination axes are constrained to be linear combinations of environmental variables and the principal axes. In this ordination diagram, the approximated correlation between two variables is equal to the cosine of the angle between the corresponding arrows. Therefore, arrows pointing in the same direction correspond to variables that are predicted to have a large positive correlation, whereas species with a large negative correlation are predicted to have arrows pointing in opposite directions.

Fig.3
figure 3

The relationships of monitored species, environmental variables and covaiables

The species points can also be projected perpendicularly to the arrows of environmental variables or covariables, which indicates the approximate ordering of the species in order of increasing value of that environmental variable (if proceeding towards the arrow tip and beyond it). The environmental variables (and covariables) are always centered (and standardized) before the ordination model is fitted. Thus, similar to projecting the species points on the variables arrows, a projection point near zero (the coordinate system origin) corresponds to the average value of that particular environmental variable in that species.

From Fig. 3, it can be seen that soil health parameter, e.g. microorganism, enzyme activity and plant diversity, significantly correlated with corresponding environmental factor, e.g. Cu, Hg, Pb, Cd and organic pollutants. In addition, it also shows the interaction of inner environmental variable and the relationship between environmental variable and covariable. The species–environment correlation can be measured for each axis as the correlation of the respective multidimensional coordinates of the species occurrences in both the species and the environmental space. The CCA displays a clear pattern for the species and variables. The main source of variation in the occurrence of the different patterns is shown by the clear separation from each other. The CCA indicates a pattern similar to the cluster analyses, but the CCA display is vaguer since it shows all the observations (Andreasen and Skovgaard 2009).

Spatial uncertainty and hazard assessment

After principle specie variable and environmental variable are selected by CCA, the spatial environmental data on each sample point are extracted by GIS. Geostatistical methodologies are then used to estimate and quantify the species and spatial distribution characteristic for pollutants under different land uses. Principle species and environmental variable and its corresponding geographical coordinate are imported to ArcGis9.0 to create the map of hazard assessment by the geostatistical methods of ordinary kriging and semivariance analysis. Of course, another spatial interpolation method also can be selected according to different spatial distribution characteristics. Phosphatase and Cu are main source of species and environmental variable confirmed by CCA. Figures 4 and 5 show that areas of high activity for phosphatase coincide with areas of high Cu concentrations.

Fig.4
figure 4

Spatial distribution characteristic of phosphatase at soil surface layer (0–20 cm) in Chongming Island

Fig.5
figure 5

Spatial distribution characteristic of Cu at soil surface layer (0–20 cm) in Chongming Island

Semivariance is an autocorrelation statistic defined as:

$$ \gamma (h) = {\frac{1}{2N(h)}}\sum\limits_{i = 1}^{N(h)} {\left[ {Z(x_{i} ) - Z(x_{i} + h)} \right]^{2} } $$
(1)

where γ(h) semivariance for interval distance class h, x i measured sample value at point i, x i  + h measured sample value at point i + h, and N(h) total number of sample couples for the lag interval h.

Nugget value is generally affected by experimental error and the variance of small sample scale, and sill is generally caused by total variability in system. Figure 4 shows that the sill value for phosphatase distribution is almost three times higher than nugget value for phosphatase distribution, which indicates that the spatial semivariance of enzyme activity is mostly from inner system, not from small sample scale error (Komnitsas and Modis 2006). The ratio of nugget value to sill value, referred to as base effect, can be used to describe the spatial variability characteristic. The higher base effect indicates that spatial variability is more affected by random factors and also can be described as the spatial correlation. The main shortcoming of estimation maps lies in the smoothing effect, which entails overestimation and underestimation of low and high values, respectively. When, in soil quality evaluations, two or more pollutants occur simultaneously, it is necessary to determine the total area affected by any of the pollutants (Franco et al. 2006). This leads to the problem of defining the area contaminated by the different pollutants simultaneously. The geostatistical map for soil quality indicates that there still exists some large areas at low soil quality that need to be mapped and considered in soil management (Rodríguez et al. 2009).

Redundancy analysis

The other substep after CCA is to make redundancy analysis (RDA). Because the data form repeated observations that include the baseline (before treatment) measurements, the interaction of treatment (including land use, fertilization, population and management measure) and time is of the greatest interest and corresponds to the effect of the experimental manipulation. RDA, a method based on a linear species response, is used because the species composition in the plots is rather homogeneous and the explanatory variables are categorical. By using the various combinations of environmental variables and covariables in RDA with the appropriate permutation scheme, tests analogous to the testing of significance of particular terms in ANOVA models (including repeated measurements) are constructed. From Fig. 6, it can be seen that the covariables change with time. RDA can show the interaction of the covariables or the change of the experimental variables and the interaction of the experimental variables with time.

Fig.6
figure 6

Principal response curves created by use RDA, and vertical 1D plot with plant cover (a) and soil enzyme activity (b) scores on the RDA axis

The diagram of the principal response curves (PRC) shows that there are two directions of departure from covariables on the reference area. Axis a and b are different plant cover and soil enzyme activity, respectively (Fig. 6). From the PRC diagram and its corresponding axis a or b, it shows that the effect of different impact factor and its interaction on plant cover and soil enzyme activity with time. For example, the scores of the invertase activity in axis b is around +1.5, and if checking the PRC score of the soil layer covariable (F in the diagram) in the year 2010, it indicates that the score is approximately +0.5. The estimated change is, therefore, exp (1.5 × 0.5) = 2.12, so all the invertase activity is predicted to have, on average, more than two times higher activity with soil layer change compared with the effect of cultivation management on invertase activity (Ter-Braak and Šmilauer 2002). Furthermore, the species scores on the first principal response axis can be compared with species traits (Fig. 6).

In this example, the PRC diagram clearly shows that the development of impact factor on species diverges during all the years, particularly for the soil layer variable (and in this way confirms that the use of time as a quantitative variable is a good approximation). In a longer run of the experiment, it is easy to find some stabilization and could estimate the time needed to achieve a ‘stable state’ (Van-den-Brink and Ter-Braak 1999; Lepš and Šmilauer 2003). Based on the combined information from all the analyses, many species (particularly the small ones) are suppressed by either environmental factor or by the absence of soil management or by a combination of both factors. Thus, it is easy to diagnose or deduce the degree of soil health and which are major pollutants in this area. Using the hazard assessment maps, e.g. Figs. 4 and 5, decisions can be made on which soil management strategy and environmental control measures to take to improve soil health and alleviate environmental pollution.

Conclusion

Controlling environmental pollution and improving soil health in large scale are still difficult and unsolved environmental problems. Viewing from the results of the map of hazard assessment and RDA, it is easy to diagnose the degree of soil health and predict where the pollution events happen, and search the pollution reason and corresponding source. Based on above diagnosis and assessment, ecologists and environmental managers can realize optimal ecology management through controlling principle pollutants, environmental variables or covariables. Soil management policy can be spatially combined with environmental data reducing the sample work and providing prediction of tendencies for environment factors in this area. Rapid environmental decisions based on visualized management for soil health and contaminations are a long way off, but “3S” technique development shows promise to speed up the process.