Introduction

Spatial data (SD) are represented as continuous functions of coordinates and time. To date, only few functions of this type have been described mathematically. For calculations using SD, an understanding of mathematical dependencies, and a consistent application of SD is required. This contradiction is resolved by use of geographic information systems (GIS). GIS replaces unknown functional dependences, making it possible to substitute continuous functions for various discrete values in databases. The databases themselves can include data derived from both instrumental measurements and modelling. Basic ways of representing databases in GIS are maps, as well as regular and irregular grids. The adequacy or inadequacy of data representation is determined by the precision of future calculations, i.e. discretisation by latitude, longitude, altitude, and time. In most cases, initial data are insufficient; this gives rise to problems in evaluating the precision of primary SD in the GIS, and the effect on the accuracy of data derived from these primary data. Primary data require accurate treatment, especially if the models to be used were created for point data. Linking of models to SD makes it possible to determine the inter-relation among the variation of meteorological conditions, soil characteristics, and land use. Examples of this approach can be found in regional studies using models of soil organic matter (Lee et al. 1993; Donigian et al. 1995; Paustian et al. 1997; Falloon et al. 1998, 2001; Smith et al. 1999). Linkage of models such as RothC and CENTURY to GIS for a study area in Central Hungary was based on consideration of the influence of differences in soil properties (such as SOC stock and texture) and climate on potential SOC changes (Falloon and Smith 2002). Other sources of spatial variation of C inputs, land management details (crops, yields, shoot/root biomass, residue treatment), are much more difficult to compile, as well as to predict future changes. The cropping system and economics strongly interact and for models used to make regional SOC estimates coupled with crop growth models, where productivity is a function of climate and soil parameters, it is usual to apply climate change to the cropping and economic system as it exists today. Agro-climatic models of crop distribution need to be integrated with farm-level economic response to decrease uncertainty of the forecasts.

These considerations were addressed for calculating changes in soil organic carbon stocks from cultivated soils of European Russia, as described in this paper. Possibilities and limitations of using data of different resolution and display mode, as well as calculation of missing values for modelling C dynamics based on integration of biogeophysical, climatic and management factors at the regional scale are discussed.

Methods

Models for which the data are required

The dataset was prepared to contain all of the data necessary to run three models of soil organic matter turnover: the dynamic models Roth C (Jenkinson and Rayner 1977; Smith et al. 2005) and CANDY (Franko et al. 1995) and a statistical model, the Model of Humus Balance (Sirotenko et al. 2002). The models used are referred to as ecosystem models (Paustian et al. 1997): the first two models were designed for the simulation of carbon dynamics at point scale that have since been adapted for regional application. The statistical model describes the changes in the C content in ploughed soddy-podzolic soils of Russia, Belarus, Ukraine, Lithuania, and Latvia, summarising the information of 60 long-term experiments. Unlike macro-level models, they contain no spatial scale represented as a regular grid. As a results of this, the GIS must promote both the description of spatial and temporal factors, which requires homogeneity in the process of data aggregation corresponding to model requirements (Paustian et al. 1997).

Models differ in their input data requirements, but a summary table was derived meeting the requirements of all models. The corresponding database contains the following 10 daily/monthly parameters: temperature, precipitation, solar radiation, potential evapo-transpiration, amounts of manure added and input of plant residues, and the following single parameters: initial carbon content in soil, soil bulk density, the content of particles with the size of <0.002 and <0.0063 mm. All models calculate changes in soil organic carbon content over time.

For the calculations, it was necessary to consider changes in climate over the next 50 years, and changes in the development of the economic situation in agriculture, either together or separately. To fulfil these needs, auxiliary data were compiled and included in the same database. The time step was determined by the time step of each model, either 10-daily (CANDY), monthly (RothC), or yearly (Humus balance model).

Constructing the data for initialising the models

For deriving data at the beginning of the simulation period, the following datasets were available:

  • Soil map of the Russian Federation, 1:2,500,000 (Fridland 1988)

  • Land use map of the USSR, 1:4,000,000 (Yanvareva et al. 1989)

  • Schematic Natural and Agricultural Zoning Map, 1:8,000,000 (Shashko et al. 1984)

  • Dokuchaev Soil Institute Database of Russia’s soils based on published and reference data.

  • Regular climatic half-degree grid of the Globe (Mitchell et al. 2004).

  • USSR Political and Administrative Map, 1:8,000,000 (Mikhailenko and Bobkov 1988)

  • An economic database, by federal entities (Romanenko 2005; Romanenko et al. 2007).

The Soil Map of Russia includes approximately 25,000 mapping units (Fig. 1). Each mapping unit can be characterised by 1–4 different soils (for homogeneous units) or their regular pattern (for heterogeneous units) and 1–2 types of parent materials. A total 205 soil types and 30 parent materials are distinguished. Complexes include 1–3 soils from the 205 possible. Therefore, one mapping unit of the Soil Map can represent maximum 4 × 3 × 2 = 24 soils; however, in reality no more than 10 occur. One soil or complex is regarded as the predominating one for each unit; i.e. it occupies more than 50% of the unit area.

Fig. 1
figure 1

Soil map of European Russia

The determination of the predominating soil within a complex is not possible from the data available. For calculation purposes, it was assumed that the share of one accompanying soil is 25% of unit area. Two and three accompanying soils occupy in sum 35 and 45% of the unit, respectively. This procedure is similar to that proposed and used in Paustian et al. (1997) and Smith et al. (2000). The same ratios are true for heterogeneous units. In the latter case it is somewhat uncertain, because each regular pattern of soils is characterised by its own ratio of components. However, at present, the database on percentage area of soils in a regular patterns is not available.

The map of land use includes approximately 10,000 mapping units. Each of them represents the share of ploughed soils, which can constitute 20–90% (Fig. 2). Other parameters were not used for the present study.

Fig. 2
figure 2

Distribution of ploughed lands in European Russia

The schematic map of natural and agricultural zoning consists of 200 mapping units (Fig. 3). Each unit has a range of agroclimatic characteristics, which are relatively homogeneous inside a unit. Reference data about the status of ploughed soils are given by reference to natural and agricultural zones, because the quantitative characteristics of the soils of the same type differ markedly in different zones.

Fig. 3
figure 3

Natural and agricultural zoning in European Russia

The soil database was created initially for 205 soil types represented in the legend to the Soil Map of the Russian Federation, 1:2,500,000 (Fridland 1988). The division used in this map was taken as the basis for further processing since this map is the most detailed. The database is designed by horizons, in accordance to the description of the typical soil profile of each soil type. Each soil horizon is characterised by several fields. A fragment of the database corresponding to the soil map is shown in Table 1.

Table 1 Podzolic, predominantly deep podzolic soils

In addition, the database contains data about the content of exchangeable cations (H, Al, Ca, Mg, and Na) and plant-available nutrients (N, P, and K). As the properties of a soil type can vary significantly, all soil characteristics are represented as ranges. Since it is impossible to characterise the soil properties of each of 25,000 mapping units, it is also impossible to avoid the representation of soil properties as a range in the database. However, a significant variation of properties of one soil complicates the calculations at the whole-country scale. To this ends, the soil database was developed considering the granulometric composition of soil in mapping units in natural and agricultural zones. This increased the number of records in the database from 205 to 6,000 but made it possible to reduce the range of variation of characteristics in a record. Table 2 contains the fragment of database on organic matter content differentiated by natural and agricultural zones.

Table 2 Podzolic, predominantly deep podzolic soils

The climatic data was obtained from the Tyndall Centre for Climate Change Research (Mitchell et al. 2004). The data from the CRU TS 2.0 database are represented with a resolution corresponding to a global half-degree grid where each node contains the data obtained by the interpolation of observations of ground-based meteorological stations (Figs. 4, 5).

Fig. 4
figure 4

Distribution of June 2000 temperature in European Russia

Fig. 5
figure 5

Distribution of June 2000 precipitation in European Russia

The data are available for the period from 1900 to 2000, with a 1-monthly step. The initial database contains the following characteristics: mean near-surface air temperature, mean precipitation, mean atmospheric humidity, mean cloudiness, and mean range of daily variations of temperature (Mitchell et al. 2004). The potential evapo-transpiration and amount of solar radiation, required for the modelling, were calculated. Table 3 contains averaged data for the period from 1961 to 1990 for the point located at 48° North, 44° East. The algorithm used for calculations is as described below:

Table 3 Fragment of a climatic database corresponding to a regular half-degree grid

Data about the dynamics of solar radiation were determined considering the nebulosity (cloudiness) and site latitude:

$$ Q = Qv[1 - (a + b{\text{n}})n], $$

where Q is a summary solar radiation, according to Berland (kal/cmday; described in Kondratiev 1965); Qv is a possible summary solar radiation, which is a reference value depending on latitude and month of the year; a and b are empirical coefficients; a is a reference value depending on latitude, and b was assumed to be equal to 0.38.

Potential evapo-transpiration from an open water surface was calculated according to Ivanov (1957), based on vapour pressure and mean air temperature:

$$ \begin{aligned}{} {\text{PEVT}} &= 0.0018*(25 + T)^{2} *(100 - a) \\ a &= eA/{\text{EA}}*100 \\ {\text{EA}} &= 6.11*\exp (17.4*T/(T + 239)), \\ \end{aligned} $$

where

PEVT:

is potential evapo-transpiration (mm/month);

a :

is relative air humidity (%);

T :

is air temperature (C)

eA :

is aqueous tension (hPa);

EA:

is saturating moisture (hPa).

In addition, the climatic data of CRU TS 2.0 includes scenarios of future climate for all five climatic variable for the period of 2000–2010 projected from four climate models (IPCC 2001; Mitchell et al. 2004) simulating climate according to four CO2 emission scenarios reported in the Special Report on Emissions Scenarios (SRES; IPCC 2000) namely the A1FI, A2, B1, B2 scenarios. In this paper, we use data derived from the HADCM3 climate model.

The Administrative Map of Russia (Mikhailenko and Bobkov 1988) represents the boundaries of entities of the Russian Federation. In our calculations, several parameters from the economic database were considered: type of crop rotation, amounts and periods of fertiliser addition, and soil management practices. These are the most important parameters determining the amount and regime of input of organic carbon of plant residues and fertilisers, and conditions of soil organic matter accumulation and mineralisation. This database is available for the federal entities (Romanenko 2005; Romanenko et al. 2007; Table 4).

Table 4 Fragment of economic database (Moscow oblast)

Assumptions

The database disregards the variations of land usage inside a federal entity and represents a unified characteristic of crop rotation, yield, and amounts of fertiliser used. This approach is a simplification since land use is not unified within administrative units: agricultural management practices differ depending on soil type, granulometric composition, and climatic conditions (Fig. 6; Table 5).

  • K is the coefficient of continentality, according to Ivanov (1957);

  • ST is the sum of air temperatures >10°С;

  • MC is the coefficient of annual atmospheric moisture (ratio of precipitation amount to evaporation);

  • Bp is the climatic index of biological productivity, in grades, relative to a mean productivity.

The majority of entities of the Russian Federation are situated in several agro-climatic zones, each of which has different cultivated crops with different yields, and amounts and periods of fertiliser addition. Figure 7 represents long-term mean data on cereal yields in Moscow Region.

Fig. 6
figure 6

Natural and agricultural zoning of the Volgograd oblast (administrative region)

Table 5 Natural and agricultural zoning of Volgograd oblast
Fig. 7
figure 7

Cereal yields in the Moscow oblast (administrative region)

Recalculating granulometric composition

The parameters used for the CANDY and RothC models to describe soil granulometric composition (the content of fractions <0.0063 and <0.002 mm) differ from standard Russian gradations (the content of fractions <0.01 and <0.001 mm). To resolve this discrepancy, the following empirical equation was used to recalculate the data from the Russian to the European system:

$$ Y = a + b*\log _{2} (X), $$

where

  • Y is fraction content, %;

  • X is fraction size;

  • a and b are empirical coefficients obtained from the system of equations:

    $$ \begin{aligned}{} Y1 &= a + b*\log _{2} (0.01) \\ Y2 &= a + b*\log _{2} (0.001) \\ \end{aligned} $$

    where

Y1 :

is the content of the fraction of <0.01 mm (reference value) and

Y2 :

is the content of the fraction of <0.001 mm (reference value).

Uncertainty

Relief matrices (USGS), data of satellite observations (Landsat), and climatic data (HADCM3) have a precise geographic reference with known accuracy. The topographic basis of the 1:1,000,000 scale used in the present work was found to be compatible with European geo-referenced data GTOPO30 (Earth Science Data Interface 1997). All other cartographic materials used in the work were converted to the topographic basis via projective and locally affine transformations. The latter were controlled using satellite data integrated to the topographic basis. The deviations of geo-referencing of the topographic basis, satellite data, GTOPO30, and GPS survey were on average 500 m. Further alignment of the reference methods is in progress, but that is beyond the scope of the present study.

Results and discussion

Combining the data layers

Having achieved the geographical compatibility of initial data and completing the calculations of missing input parameters for the models, we then dealt with the challenges of unification of the data at different scales, discreteness, and representation method as described below.

Several possibilities were considered.

  1. 1.

    Use the climatic grid as the basis.

Advantages: regular grid, geo-referenced, no need to average the climatic database, and easy to determine the calculation accuracy.

Disadvantages: Insufficient representativeness of soils with each unit (Fig. 8) and a large number of records to consider: 3,000,000 records for each year.

  1. 2.

    Use the soil map as the basis.

Advantages: maximum possible representativeness of sampling.

Fig. 8
figure 8

A fragment of the soil map (1: 2,500,000 scale) with points of the regular climatic grid

Disadvantages: the interpolation of climatic data is necessary; the map of intersection of four maps is required; the table for calculations will constitute approximately 20,000,000 records.

  1. 3.

    Create the map of maximally homogeneous units based on the intersection of four maps. Homogeneous characteristics should be: soil type, granulometric composition, administrative unit, natural and agricultural zoning, percentage of ploughed soils.

Advantages: the table of calculations can be reduced to 10,000 records; sampling representativeness is regulated (i.e. it is one order of magnitude higher compared to use of the climatic grid and can be increased to the maximum defined by the soil map, if the size of the calculation table is increased; the interpolation of climatic database is not required.

Disadvantages: the climatic database becomes a range-based one (i.e. sampling representativeness of the climatic base); the compilation of the joint map must be undertaken.

  1. 4.

    To create an arbitrary regular calculation grid.

Advantages: controlled sampling representativeness.

Disadvantages: significant increase of table size (12,000,000 records or more); the interpolation of climatic data is required.

The calculations of initial carbon pools based on soil map as a whole, and on the sampling from the half-degree grid, can result in a discrepancy of >30%, which is inappropriate for use in the proposed modelling. Options 2 and 4 are costly and time-consuming so variant 3 was chosen for this project.

The major principle for compiling a joint map is the minimisation of interpolation calculations. Economic data are represented by federal entities, while the range of soil and agrochemical data are initially represented by natural and agricultural units. In connection with this, new mapping units are delimited by dividing the federal entities (Fig. 9) by the boundaries of natural and agricultural zones (Fig. 3).

Fig. 9
figure 9

Administrative units of European Russia

The areas obtained as a result of interpolation of these two maps are divided into the sites with homogeneous agricultural use. In turn, the latter were divided into territories with homogeneous soil cover. In this way, 200 units with a uniform economic base, agrochemical characteristics, and land use were obtained (Fig. 10).

Fig. 10
figure 10

Map of homogeneous units based on the intersection of four basic maps

Complete homogeneity of soils was not reached; up to three soils predominating within a unit were determined, assuming that they occupy more than 80% of the unit area. Hence, the cartographic basis for the calculation of carbon flow in soils of Russia with the help of different models was created successfully.

We used four climatic scenarios so the structure of records characterising one mapping unit were finalised as follows:

  • 1 unit: 3 soils;

  • 1 soil: 4 variants: ploughed lands, fodder lands, and natural soils with minimal and maximal carbon content;

  • 1 variant: 3 economic scenarios;

  • 1 economic scenario: 4 climatic scenarios.

In total, the data base includes 38,400 possible records. After removing non-existent combinations, 20,000 records remain. The example of the data set required for the calculation of soil carbon stock changes over 10 y ears using the RothC model is given in Tables 6, 7, 8, 9, 10 and 11, one economic scenario and one scenario of climatic change being considered.

Table 6 Climatic data for one year (10 files required)
Table 7 Crop rotation
Table 8 Input of organic fertilisers
Table 9 Data about crop yields
Table 10 Data about harvesting terms
Table 11 Soil properties

Note. For the whole period, ten tables were used.

Sources of error

The resulting database, used for the calculation of soil carbon stock changes, has several sources of errors.

  1. 1.

    The data characterise the percentage of ploughed soil but do not specify which soils are ploughed. In connection with this, the percent of ploughed lands was assumed to be equal for all soils considered. Possible errors in the evaluation of organic carbon content resulting from this assumption could constitute up to 10% of the estimate.

  2. 2.

    Up to 20% of soils are excluded from the calculations due to their low share in the soil cover of mapping units. However, it is incorrect to postulate that the accuracy of averaging is also 20%, because agrochemical characteristics vary relatively insignificantly within a unit. The error of mean carbon content resulting from this simplification does not exceed 2%.

  3. 3.

    Economic data. The yield in different regions of a federal entity can deviate from the average by 25%.

  4. 4.

    Lands excluded from the land use data layer in the period from 1992 to present can constitute up to 25–30%. As the data concerning the presence and location of these lands were absent, the calculations operated with the maximum percent of ploughed lands for the year 1990.

  5. 5.

    Highly organic soils were not included in calculations because the models used are parameterised to simulate SOC dynamics in mineral soils only. Although the percentage of organic soils in the ploughed lands of Russia is generally low, in some units, it can constitute 25% or more. To illustrate this, the composition of soil cover of the unit 25 is represented in Table 12.

Despite highly organic soils being present in 27% of polygons in this region, agriculture occurs on few of these soils. For example, in the south taiga, organic soils contain 37,300 Tg C, but organic soils under cropland contain only 99 Tg C, compared to 3,600 Tg C found in mineral soils under cropland in the same region (Orlov et al. 1996).

  1. 6.

    The variations of climatic characteristics within a unit can reach 15%.

  2. 7.

    Mean interval of variation of all soil properties in the initial database is 5–7%.

Table 12 example of information held about the soil types within a mapping unit

Conclusions and recommendations for future improvements

We have not examine all possible ways of increasing calculation accuracy. We have considered only those that can be implemented at present. Future improvements include the following:

  1. 1.

    The economic data could be revised so that data are recompiled on the basis of the averaged joint map. The level of detail corresponds to the regional division of Russia. In the case of a region divided by an agroclimatic boundary, the units should be further separated.

  2. 2.

    It is necessary to create a special layer and database describing long-fallow soils, which are not included. It is important to include highly organic soils in the future.

  3. 3.

    The joint map of homogeneous units should be specified in order to reduce climatic errors. The specification and detail should aim at reducing variation in the climatic data.

In the course of present study, the GIS characterising the soils of European part of Russia was created. It is appropriate for multiple calculations and modelling and was tested with models simulating changes in soil carbon stocks (Smith et al.2007a, b). Further development of this GIS will allow it be used in the future for other purposes, such as monitoring different soil properties. The suggested concept of collecting and averaging data, as described in this paper, makes it possible to reduce the calculations of C dynamics to an acceptable level whilst maintaining acceptable accuracy with quantified uncertainty.