Introduction

High resolution stratigraphic pollen sequences can potentially provide detailed information about past vegetation evolution and climatic change, and studies of the relationships among modern pollen assemblages, vegetation and climatic variables are important for improving the interpretation of fossil records. Quantitative climate reconstructions from pollen data based on various numerical methods, such as the Modern Analogue Technique (Overpeck et al. 1985; Simpson 2011) and the Multivariate Calibration-Function Approach (ter Braak and Juggins 1993; Birks et al. 2010; Juggins and Birks 2012), are an important source of information on past climate changes. However, human influences on vegetation, such as cultivation, deforestation, irrigation and the introduction of exotic species, are pervasive in most global ecosystems and can potentially obscure natural vegetation-climate relationships and distort the results of quantitative reconstructions (St. Jacques et al. 2008, 2015; Li et al. 2014b; Ding et al. 2017; Tian et al. 2017). Therefore, quantitative reconstruction of vegetation and climate using pollen data from densely populated regions is potentially challenging for palynologists (Tarasov et al. 1999; Seppä et al. 2004; Bjune et al. 2010; Xu et al. 2010a; Juggins and Birks 2012).

In China, pollen-based quantitative climate reconstructions have been mainly conducted in the Tibetan Plateau (Tang et al. 2000, 2009; Shen et al. 2006; Herzschuh et al. 2009, 2010; Lu et al. 2011; Chen et al. 2014; Wang et al. 2014; Opitz et al. 2015) and northern China (Jiang et al. 2009; Wen et al. 2010, 2013; Xu et al. 2010b; Sun and Feng 2013; Chen et al. 2015; Stebich et al. 2015), and these studies have demonstrated the prevailing patterns of vegetation and climate changes during the late Pleistocene. However, China is the most populated country on Earth with a history of agriculture dating back to the early Holocene (An 1989; Barton et al. 2009; Lu et al. 2009; Zhao et al. 2012). Early agriculture would be expected to have some degree of impact on vegetation and thus reconstructions of palaeovegetation and palaeoclimate may be biased by human impacts. Consequently, such impacts on both modern pollen reference datasets and fossil pollen records need to be considered when using them to reconstruct past climates.

In the past two decades, several modern pollen studies have been conducted in China to investigate human impacts on vegetation (Liu et al. 2006; Ma et al. 2009; Wang et al. 2009; Zhang et al. 2010; Huang et al. 2011; Ding et al. 2011; Li et al. 2012, 2013, 2015; Wei and Zhao 2016). Samples representing human-influenced vegetation communities were excluded from representative modern pollen reference datasets (Zheng et al. 2014) when quantitatively reconstructing climate. However, pollen-based climate reconstructions may still potentially be biased. The Hexi Corridor and Xinjiang, in Central Asia, are important routeways of the ancient Silk Road, and were major crossroads of ancient economic and cultural exchange between the East and West. Agricultural activity in the region was focused on oases and has sustained the local human economies for thousands of years. However, little is known of the impact of human activities on the vegetation and surface pollen spectra in the region (Ma et al. 2009; Huang et al. 2011; Wei and Zhao 2016).

Here, we present the results of a study of modern pollen assemblages in 143 topsoil samples from different local vegetation types along a transect from Lanzhou to Urumqi in northwestern China. Our aims are as follows: (1) to investigate the relationships between modern pollen assemblages and both the modern vegetation and regional climate based on constrained and partial canonical ordination techniques (RDA and partial RDA), and (2) to explore relationships between human impacts and surface pollen assemblages in different vegetation zones. Our results potentially provide an improved basis for the reconstruction of palaeovegetation, palaeoclimate and human impacts in the region and in other arid and semi-arid regions.

Regional setting

The study area extends from 35° to 48°N and from 75° to 110°E, with elevations ranging between − 87 and 3,464 m (Fig. 1A). Climatically, the mean annual precipitation (MAP) varies from 33 mm in the Gobi Desert, to more than 480 mm in the Qilianshan Mountains, and the mean annual temperature (MAT) is 0 to 13 °C (Fig. 1B).

Fig. 1
figure 1

Location of the study region and sampling sites. A Location of the study area. B Distribution of mean annual precipitation (MAP) and mean annual temperature (MAT)

The vegetation is temperate steppe and temperate desert, and includes seven vegetation sub-zones: southern temperate forest-(meadow) steppe, southern temperate desert steppe, temperate subshrub/dwarf arbor desert, temperate shrub/grassland semi-desert, temperate shrub/subshrub desert, warm-temperate shrub/subshrub desert and warm-temperate shrub/subshrub bare dry desert (Hou 2001; Fig. 2). The dominant vegetation composition of these zones (Xinjiang Integrated Expedition and Institute of Botany, Chinese Academy of Sciences 1978) is as follows: Temperate forest-(meadow) steppe: Picea crassfiolia, P. wilsonii, Quercus liaotungensis, Pinus armandii, P. tabulaeformis, Ostryopsis davidiana, Stipa bungeana, S. glareosa, S. grandis, Orinus kokonorica, Festuca ovina, Achnatherum splendes, Koeleria cristata, Poa annua and Carex condilapis. Desert steppe: Stipa, Festuca sulcata, Artemisia, Anabasis salsa, A. brevifolia, Nanophyton erinaceum, Eurotia ceratoides, Tanacetum trifidum, T. achillaeoides, Allium and Caragana. Shrub deserts: Ephedra, Zygophyllum, Nitraria, Calligonum, Atraphaxis, Gymnocarpos and Convolvulus. Dwarf arbor desert: Haloxylon ammodendron and H. persicum. Sub-shrub deserts: Calligonum leucocladum, Asterothamnus fruticosus, Reaumuria, Eurotia ceratoides, Kalidium schrenkianum, Iljinia regelii and Sympegma regelii. The vegetation of the Qilianshan and Tianshan Mountains is mainly composed of boreal coniferous forest and mountain steppe (Hou 2001), and Picea, Abies, Pinus and Larix are the major components of the boreal coniferous forest.

Fig. 2
figure 2

(modified from Hou 2001)

Distribution of vegetation types in the study area

Irrigation agriculture is well developed in the oasis areas, where for example Triticum aestivum, Zea mays, Gossypium and Vitis vinifera are widely cultivated. In the vegetable-growing areas, Solanum tuberosum, Capsicum frutescans, Cuminum cyminum, Helianthus cannuua, Lycopersicon esculentum and Citrullus lanatus are planted. Around the farmlands and vegetable-growing areas, Populus, Salix, Elaeagnus oxycarpa and Ulmus pumila are common; while in the wastelands, Tamarix chinensis, T. ramosissima, Lycium chinense, Achnatherum splendens, Caragana sinica and Peganum harmala are present.

Materials and methods

Field work

A total of 143 topsoil pollen samples was collected along a transect from Lanzhou to Urumqi at roughly 50-km intervals in late June of 2008 and 2009 (Fig. 1; ESM). At each sampling site, 4–5 samples of topsoil were randomly collected and then combined into a single sample. To compare the pollen assemblages of different human-influenced vegetation types, two sampling points were selected at culturally impacted sites: one sample from farmland and the other from the surrounding area (here, we use the term ‘wasteland’ to refer to this zone). Samples were also collected from other vegetation types, such as desert, steppe, alpine meadow, forest and river valleys. The location of each sampling site was determined using GPS. In addition, the vegetation inventories were performed at each sampling site within a 10 × 10 m quadrat for forest and a 5 × 5 m quadrat for other vegetation types. The vegetation types and the major plant taxa were recorded in the field (ESM), and the total vegetation cover as a percentage of the total land area and the cover of individual taxa were also measured. The vegetation cover was estimated by the vertical projection of tree crowns or the stems and leaves of grass/shrub on the ground for each species. The vegetation cover data are referenced below but are not shown in the ESM. Of the 143 topsoil sampling sites, 54 were from farmland, 25 from wasteland, 37 from desert, 17 from steppe/meadow, 7 from forest and the other 3 from river valleys (Fig. 1; ESM).

Laboratory analysis

2–10 g of topsoil sample were used for pollen analysis. The samples were processed using standard procedures (Fægri and Iversen 1989; Moore et al. 1991): 10% HCl was used to dissolve carbonate; 10% KOH and 40% HF to remove humic components and silicates, respectively; and acetolysis and sieving with a 7-μm mesh to remove cellulose and humic material and clay-sized particles, respectively. A tablet containing a known number of Lycopodium spores (about 12,542 ± 2,081) was added to each sample as a tracer and for the calculation of pollen concentration. Pollen taxa were identified under an optical microscope at 400 × magnification. At least 400 terrestrial pollen grains were counted for each sample. The pollen identifications followed a modern reference collection and cereal-type pollen was identified using a standard cut-off grain diameter of 40 µm, based on previous research (Andersen 1979; Chester and Raine 2001). Pollen percentages were calculated based on the sum of total terrestrial pollen, and pollen diagrams were plotted using Tilia 2.0.2 (Grimm 2004).

Numerical methods

The modern climatic data for each sampling site was interpolated using a database from China’s Meteorological Administration and from http://www.ncdc.noaa.gov/oa/ncdc.html. The spatial interpolation technique used is gradient plus inverse distance squared, considering the variation of meteorological elements with the gradients of elevation, latitude and longitude (Lin et al. 2002). Three of the main climatic parameters, mean annual precipitation (MAP), mean annual temperature (MAT) and July mean temperature (TJuly) were used.

Human influences, such as cultivation, irrigation, deforestation and urbanization are pervasive and intensive in our study area, and will potentially blur natural vegetation-climate relationships (cf. Birks and Seppä 2004). Therefore, in this study we used the human influence index (HII) as a proxy to quantify the impact of human activity on surface pollen spectra. The HII values for individual sampling sites were derived from the global HII dataset with a spatial resolution of 1 km2 (Sanderson et al. 2002; WCS/CIESIN 2005). This dataset has been produced by integrating several indices that reflect the intensity of human impacts, including human population density, infrastructure and accessibility. The HII values vary from a minimum of 0 to a maximum of 64 (Sanderson et al. 2002; WCS/CIESIN 2005).

Ordination methods are widely used to investigate the relationships among pollen taxa, vegetation and environmental variables. Detrended correspondence analysis (DCA) was first applied to the pollen data to determine whether linear- or unimodal-based techniques should be used in subsequent ordination analyses. In this study, only pollen taxa with percentages > 1% in at least three samples were used in the ordination analysis. The DCA results showed that the gradient lengths of the first four axes were less than 2.335 standard deviation (SD) units, indicating that linear techniques were appropriate for the data (Birks 1998). Therefore, redundancy analysis (RDA) was conducted to explore site-to-site similarities and the primary taxonomic variation patterns among samples, as well as to determine the effects of different environmental factors on the surface pollen spectra. In addition, to quantify the relative impact of the different environmental variables on pollen data, both RDA and partial RDA were performed. In the RDA analysis, a total of 37 pollen taxa was chosen as the dataset, and elevation, three climatic variables (MAP, MAT and TJuly), six vegetation/landscape types (farmland, wasteland, desert, steppe/meadow, forest and river valley) and human influence index (HII) were used as the environmental variables (11 explanatory variables). In the variation partitioning analysis, three groups of variables (vegetation type, climate and HII) were used. The statistical significance of the RDA was assessed using a Monte Carlo permutation test (499 unrestricted permutations). In addition, both the conditional and simple effects of the different variables and their shared effects were also tested. All the numerical analyses were performed using CANOCO v.4.5.2 software (ter Braak and Šmilauer 2003) following guidelines from Šmilauer and Lepš (2014) and the pollen percentages were normalized using a log transformation.

The pollen representation (R) and relative representation (Rrel) values of the major pollen taxa were calculated according to the definition in Davis (1963) and Andersen (1970) to quantitatively assess pollen-vegetation relationships. Amaranthaceae (i.e. Chenopodiaceae), because of its high occurrence in most of the samples and its distinctive pollen morphology, was chosen as a reference taxon for calculating the representation of other taxa. Samples with vegetation cover > 10% were selected in the calculation. Thus, R and Rrel were calculated as follows:

$${{\text{R}}_{{\text{rel}}}}={{\text{R}}_{{\text{taxon}}}}/{{\text{R}}_{{\text{Amaranthaceae}}}}$$

where R is the pollen percentage/percentage area of the total vegetation covered by the plant taxon.

Results

Pollen assemblages

A total of 63 pollen and spore taxa were identified from 143 topsoil samples and a summary percentage pollen diagram with 37 selected taxa is shown in Fig. 3. The pollen assemblages are dominated by herb pollen (with average percentages > 64%), especially Amaranthaceae, Artemisia, Poaceae, Brassicaceae, cereal-type, Cyperaceae and Asteraceae. The total tree pollen percentages in the samples from forest and river valley are > 15%, while the tree pollen percentages are < 5% in farmland and the other sampling sites. The shrub pollen percentages in desert and river valley samples are general > 20% and > 10%, respectively, while for other sampling sites shrubs are relatively well represented, with values < 10%.

Fig. 3
figure 3

Pollen diagram for the topsoil samples from arid and semi-arid northwestern China. Only major taxa are shown

As shown in Fig. 3, the abundance of each pollen taxon varies in different vegetation types. Samples S001-S054 are from farmland where wheat, barley, maize, rapeseed, cotton and sunflower, as well as various vegetables and fruits (including grapes, watermelon, potatoes, peppers, fennel and radishes), are cultivated. These samples have relatively high pollen frequencies of Amaranthaceae and cultivated plants, mainly cereal-type (up to 72.3%, mean of 16.3%), Brassicaceae (up to 90.9%, mean of 20.1%) and Vitaceae (up to 65.8%). The pollen assemblages from wasteland surrounding the farmland are dominated by Amaranthaceae (mean of 27.3%), Artemisia (mean of 20.2%), and other taxa include Poaceae, Asteraceae and Cyperaceae; in addition, the pollen of cultivated plants is significantly reduced. In the samples from desert, the most abundant pollen type is Amaranthaceae, with a maximum of 86.6% and a mean of 48.5%, which is the highest representation amongst all the vegetation categories studied. In addition, in some areas of desert, Nitraria, Ephedra and Tamarix are common pollen types, and the pollen of cultivated plants is the least well represented. The steppe/meadow samples are dominated by Cyperaceae, with a maximum percentage of 76.3% and a mean of 38.2%. Other taxa have relatively high frequencies, including Poaceae (mean of 11.1%), Asteraceae (mean of 8.3%) and Thalictrum (mean of 2.2%). In the forest samples, the pollen assemblages are dominated by Picea (up to 39.7%, mean of 13.6%) and Cyperaceae (up to 50%, mean of 24.2%). The pollen assemblages from river valleys are dominated by Cyperaceae (mean of 38.5%), and have the highest Cyperaceae representation in all the samples studied. In contrast, the frequencies of Amaranthaceae (mean of 4.9%), Artemisia (mean of 3.6%) and Poaceae (mean of 3.2%) in the river valley samples are the lowest in the data set.

The pollen concentration varies according to the different vegetation types. In the farmland, the total pollen concentration ranges from 1,394–74,743 grains/g, with a mean value of 15,782 grains/g, which is the lowest in all the samples. By contrast, the mean total pollen concentration of samples from river valley is the highest—617,143 grains/g. The mean pollen concentrations of samples from wasteland, desert and steppe/meadow are 36,923 grains/g, 22,706 grains/g and 45,067 grains/g, respectively.

Results of numerical analysis

RDA

The results of RDA ordination of the percentages of the 37 taxa and the environmental variable are presented in Fig. 4 and Table 1. The first four axes of the RDA are significantly correlated (p = 0.002) with all the environmental variables, and the first two axes explain 28.4% (axis 1: 18.8%, axis 2: 9.6%) of the total variance. The results also show that all the environmental variables are statistically significantly related to the variance in the pollen data (p = 0.002), with MAP capturing the largest proportion (Table 1). Among the environmental variables, the temperature variables (MAT and TJuly) are strongly positively correlated with each other, and they are very strongly negatively correlated with MAP, while MAP is positively correlated with elevation. HII is not correlated with any climate variables but is strongly positively correlated with farmland. In addition, the temperature variables are positively correlated with desert. According to the RDA, climate variables and vegetation types are the primary factors that control the variations of the surface pollen data. HII values are weakly correlated with the first two axes (axis 1: − 0.1165, axis 2: − 0.4492), indicating that HII has a less important influence on the surface pollen spectra than climate and vegetation. However, HII is significantly correlated with several pollen taxa: e.g. cereal-type, Brassicaceae, Apiaceae, Elaeagnaceae and Cannabaceae (Fig. 4A). Most of the farmland samples and some desert and wasteland samples are also correlated with HII (Fig. 4B). Therefore, as would be expected, human influences in the study area are mainly reflected in the surface pollen spectra from farmland, and to a lesser extent in those from wasteland and desert (Fig. 4). In addition, human influences are reflected by changes in cereal-type, Brassicaceae, Apiaceae, Elaeagnaceae and Cannabaceae, because of the high correlations between these pollen taxa and HII.

Fig. 4
figure 4

Results of redundancy analysis (RDA) of surface pollen spectra from arid and semi-arid northwestern China. A Biplots of the RDA results based on the surface pollen taxa. B Biplots of the RDA results based on the surface pollen samples

Table 1 RDA results for modern pollen assemblages and environmental variables

The RDA results based on pollen samples (Fig. 4B) indicate that the first two axes (λ1 = 0.186, λ2 = 0.098) primarily differentiate most of the samples from farmland and the other vegetation types. They also separate samples from sites at high altitude and with high MAP from samples from sites characterized by high temperatures (MAT and TJuly). However, a group of samples from wasteland overlaps with the samples from desert, and at the same time, the samples from steppe/meadow, forest and river valley are overlapping. The sample-environment biplots show that MAP and elevation are closely related to the negative side of axis 1, while MAT and TJuly are positioned to the right of axis 1; in addition, HII is negatively correlated with axis 2. The pollen spectra from high altitudes and with high MAP are generally located to the left of the first axis (steppe/meadow, forest and river valley), while the spectra from lower altitudes and with high MAT/TJuly but low MAP are located on the positive side of axis 1 (i.e. desert). In addition, the sites with intensive human activity (high HII values; i.e. farmland and most of the wasteland) are located on the negative side of axis 2, while the samples from areas with predominantly natural vegetation (desert, steppe, forest and river valley) are located on the other side of axis 2.

In addition, the RDA results for the pollen types (Fig. 4A) clearly reveal the relationships between pollen taxa, vegetation types and climate. Cereal-type is positively associated with farmland and HII, but weakly related with climate. Ephedra, Nitraria and Amaranthaceae are positively associated with desert, MAT and TJuly, and negatively correlated with MAP and elevation - that is associated with high temperatures but low precipitation. Cyperaceae, Thalictrum, Picea, Polypodiaceae, Caryophyllaceae and Brassicaceae are positively correlated with each other, and strongly associated with high elevation and high precipitation, but low temperature. Brassicaceae is also positively correlated with HII and farmland. Forest, steppe/meadow and river valley have no clear relationship with any single pollen taxon or climatic variable.

A Monte Carlo permutation test of the RDA results (Table 1) indicates that the environmental variables of MAP and temperature, farmland, wasteland and desert explain > 2% of the variance and are statistically significant (p < 0.05). Most of the total explained variation in pollen assemblages (40.8%, 59.2% unexplained) is explained by MAP (16%), three vegetation types (farmland, wasteland and desert, 15%) and temperature (MAT and TJuly, 5%) (36% in total). The vegetation types, that is steppe/meadow, forest and river valley, are not significant environmental variables.

Variance partitioning

In an analysis of the partitioning of the variance (Fig. 5; Table 2), the significance tests of the conditional and simple effects show that all the proportions tested are significant (p < 0.05). Vegetation, climate and HII together explain 38.7% of the variance in the pollen data (Fig. 5). Decomposition of the pollen assemblage variance suggests that the largest proportion (14.6%) is accounted for by the effect of vegetation alone, while climate alone accounts for 12.6% of the variance, and the HII alone for only 1% (Fig. 5). The shared effect of vegetation and climate accounts for 7.7%, while that of vegetation and HII, as well as climate and HII, account for 2 and 0.6%, respectively.

Fig. 5
figure 5

Results of variance partitioning analysis for three groups of environmental variables (vegetation types, climate and HII). A Proportions of three individual variables or groups of variables (a–g). B The amount of explained variance (in percentages) is shown for vegetation type, climate and HII

Table 2 Tests of the conditional and simple effects of individual variable groups and combinations of variable groups on the variation in the pollen data

Pollen relative representation ratio

The representation (R) and relative representation (Rrel) values in the study area show a broad range for each pollen taxon (Table 2). However, a semi-quantitative sequence of relative representation from under- to over-representation of the major pollen types can be established. The trend of variation from under-representation to over-representation is as follows: Fabaceae < Asteraceae < Zygophyllaceae < Tamarix < Poaceae < Nitraria < Amaranthaceae < Ephedra < Artemisia. Relative to Amaranthaceae, Artemisia and Ephedra have high Rrel values (median values of 2.49 and 1.84, respectively), while other pollen types (Poaceae, Fabaceae, Asteraceae, Zygophyllaceae, Tamarix and Nitraria) have low Rrel values (median < 1).

Discussion

Pollen-vegetation relationships

The pollen assemblages of the surface samples indicate that the pollen assemblages of different vegetation types contain different representative components (Fig. 3). Cereal-type (≥ 15%) and Brassicaceae (≥ 20%) are the dominant taxa in farmland; wasteland surrounding the farmland is characterized by a high diversity of the pollen of non-cultivated plants; and Amaranthaceae (≥ 45%), Artemisia (≥ 10%) and Ephedra (≥ 10%) are the most important desert taxa. However, the RDA results show that, in our study area, pollen assemblages can only distinguish farmland from other vegetation types (Fig. 4). Cyperaceae (≥ 35%), Thalictrum (≥ 2%) and Picea (≥ 10%) are the dominant pollen types in steppe/meadow, forest and river valleys (Fig. 3). In addition, the RDA results show that discriminating steppe/meadow, forest and river valley may be difficult because the steppe/meadow samples are grouped together with forest and river valley samples. Moreover, although the samples from desert and wasteland are overlapping in the RDA plot (Fig. 4B), the pollen assemblages from these vegetation types can be separated based on the percentages of Amaranthaceae, Artemisia and Ephedra: that is, Amaranthaceae (≥ 45%) and Ephedra (≥ 10%) are high, but Artemisia (≥ 10%), is low in desert; and Amaranthaceae (≥ 25%) and Artemisia (≥ 2%) are high, but Ephedra (< 5%) is low, in wasteland. Comparison with previous investigations in northwestern China (Luo et al. 2009; Li et al. 2017) reveals both similarities and differences. For example, all the studies show that desert is characterized by high percentages of Amaranthaceae, and Cyperaceae is the dominant pollen type in steppe. However, the results of both Luo et al. (2009) and our study show that Ephedra and Artemisia are the main components of desert, while Li et al. (2017) suggest that these two pollen taxa are associated with steppe. The discrepancies among these studies may be the result of differences in sampling method and location, together with the definition of vegetation types.

Various types of farmland are dominated by different taxa, although Amaranthaceae contributes significantly to all these samples (Fig. 3). For example, the pollen assemblages of samples from fields planted with wheat, corn and barley (S002–003, S006–018, S031–S048) are characterized by a relatively high abundance of cereal-type and Amaranthaceae; while samples from rape fields (e.g. S001, S004–S005, S027–S030, S049–S054) are dominated by Brassicaceae. Similarly, the characteristic feature of the pollen spectra from vineyards (S019, S022–S024) is the predominance of Vitaceae and Amaranthaceae. However, the pollen assemblages from cotton fields and fields with mixed crops are dominated by Amaranthaceae and Poaceae but they have relatively low values of Malvaceae, Solanaceae, Cucurbitaceae and Cannabaceae, indicating that the pollen of the dominant crops in these fields is under-represented. Nevertheless, the samples from these types of farmland are characterized by the pollen of Malvaceae, Solanaceae, Cucurbitaceae and Cannabaceae, although the percentages are relatively low.

The representation of several major pollen types was also estimated by a qualitative comparison of the abundance of each pollen taxon and its observed vegetation coverage around the sampling site. Amaranthaceae pollen occurs in all the samples even though no Amaranthaceae plants were growing near the sampling sites (steppe/meadow). This can be attributed to the high degree of preservation and long-distance dispersal capacity of Amaranthaceae. In addition, Artemisia is relatively abundant in almost all the samples and is clearly over-represented (Fig. 3; Table 3). These results agree with the findings of many previous studies of surface pollen spectra from arid China (Herzschuh et al. 2003; Yang et al. 2004; Luo et al. 2009; Zhao and Herzschuh 2009; Wei and Zhao 2016).

Table 3 Representation (R) and relative representation (Rrel) values of major pollen taxa in arid and semi-arid northwestern China

Poaceae exhibits relatively low percentages in all samples even though gramineous plants are abundant in the steppe vegetation (in sample S118, its vegetation cover is up to 90%), suggesting that is under-represented (Table 3). Similar conclusions were made in several studies of surface pollen spectra from the steppe and desert regions of northern China (Li et al. 2005; Zhao and Herzschuh 2009; Wei and Zhao 2016).

The pollen of Fabaceae and Asteraceae is present in all samples with a very low abundance, even in samples from sites where the parent plants were abundant (e.g. in sample S057 the Fabaceae plant cover is 80%, but its pollen percentage is 7.2%; in sample S058 the Asteraceae plant cover is 80%, but its pollen percentage is 38.7%). This indicates that these two pollen types are under-represented (Table 3).

Tamarix, Nitraria and Ephedra are the most common taxa in desert. Tamarix has very low pollen percentages at the sample sites where it occurs, while its pollen is not represented at sites where the plant is absent. This suggests that Tamarix is under-represented (Table 3), probably because of its relatively low pollen dispersal capacity (Wei et al. 2009). Nitraria is mainly found in the samples from desert, which indicates that the pollen representation of Nitraria is strongly correlated with the presence of the parent plants and it is moderately represented (Table 3; Herzschuh et al. 2003; Zhao and Herzschuh 2009). Ephedra is present in almost all the samples irrespective of whether the parent plants are present, with the pollen percentages (from 0 to 85.5%) suggesting that Ephedra pollen has a high dispersal capacity and is over-represented (Table 3; Yan and Xu 1989; Xu et al. 1996; Carrión 2002; Herzschuh et al. 2003; Wei et al. 2011).

It can be concluded from the foregoing discussion that the pollen assemblages of surface samples reflect the local vegetation composition moderately well. However, it is noteworthy that the pollen concentration has no direct relationship with vegetation type and vegetation cover. For example, the vegetation cover of farmland (up to 100%) is the highest in all the vegetation types, but the pollen concentration is the lowest, and lower than that of the desert (Fig. 3). There are several possible explanations for this: (1) Farming activities (repeated ploughing and sowing) loosen the soil and promote strongly oxidizing conditions, which does not promote pollen preservation. (2) Various fertilizers have been applied to the farmland creating an alkaline or neutral environment, in which the pollen grains are easily damaged. (3) Several samples from desert may have been collected from near the plant and flowers, resulting in the extremely high pollen concentration of some samples. By contrast, pollen grains could be preserved well in wasteland, steppe/meadow and other landscapes because of the relatively high moisture and compact soil conditions. As mentioned above, the pollen assemblages of soil samples from farmland may reflect the vegetation conditions of several years, whereas the vegetation survey of the farmland reflects the conditions of that sampling year. Nevertheless, the pollen assemblages of farmland may still indicate the vegetation composition of both the farmland and regional vegetation background.

Pollen-climate relationships

The RDA results (Fig. 4; Table 1) demonstrate that the modern pollen assemblages are mainly controlled by climatic variables (MAP and temperature), although human activity also has a significant impact on the vegetation distribution of the study region. Similar results were reported previously from the Qinghai–Tibetan Plateau and northwestern China (Wei and Zhao 2016; Ma et al. 2017). These results reveal that precipitation (MAP) is the dominant factor influencing the modern pollen spectra in the study area. The pollen spectra from steppe/meadow, forest and river valleys at relatively high altitudes, where Cyperaceae, Thalictrum and Picea are the dominant pollen taxa, reflect the relatively high MAP and low MAT. In contrast, the pollen spectra from desert, dominated by Amaranthaceae and Ephedra, reflect relatively low MAP and high MAT. For the samples from farmland and the surrounding wasteland, the RDA results reveal a specific distribution of points in the RDA biplots (Fig. 4B), with samples on the left side from areas with a high MAP and low MAT, while those on the right side are from areas with a high MAT and low MAP; in contrast, the samples in the central part of the plot are from areas with moderate MAP and MAT. This result can be attributed to the different growth conditions of the various cultivated crops. For example, the rape field sites have relatively high MAP but low MAT, the vineyard sites have a relatively high MAT but low MAP, and the wheat and corn field sites have moderate MAP and MAT. Therefore, our results suggest that modern pollen data from human-influenced vegetation sites still potentially enable quantitative climatic reconstruction in similar regions.

The results of the variance partitioning analysis lead to the same conclusion: that the climatic variables have a large impact on the pollen assemblages in our study region. As shown by Fig. 5 and Table 2, the climate variables alone explain 12.6% of the variation in the pollen assemblages, while vegetation explains 14.6% and HII only explains 1%. Vegetation, climate and HII together explain 38.7%. Because pollen assemblages are directly influenced by vegetation, it is understandable that vegetation alone accounts for the largest proportion of the explained variation. Moreover, the natural vegetation is primarily controlled by climate; therefore, the explained variation of climate is slightly lower than that of the vegetation. In addition, vegetation and climate have a combined explanatory effect of 7.7% on the pollen assemblages.

Relationship between pollen spectra and human influences

The RDA results revealed that the HII has less influence on the modern pollen assemblages than the climatic variables and vegetation, in agreement with the results of our variance partitioning analysis and previous findings (Liu et al. 2006, 2008; Zhang et al. 2010; Li et al. 2014b). The results of variance partitioning analysis (Fig. 5; Table 2) show that the HII alone explains only 1% of the variation in the pollen assemblages. The results also reveal that the shared effect of HII and vegetation does increase the explained variation compared with HII alone and the combined effect of HII and climate. However, the HII can differentiate effectively between human-influenced (farmland and wasteland) and natural (desert, steppe, forest and river valley) vegetation categories (Fig. 4B). In addition, the HII is significantly correlated with the pollen frequencies of Apiaceae, Elaeagnaceae, Liliaceae and cultivated plants such as cereal-type, Cannabaceae and Brassicaceae, with the strongest correlation with cereal-type (Fig. 4A). This result differs from previous studies. For example, in a study of alpine meadow and steppe in the Tianshan Mountains, HII was correlated with Poaceae, Plantago, Polygonaceae and Elaeagnaceae (Wei and Zhao 2016). However, the ordination results of modern pollen data from northeastern China show that the HII was significantly correlated with Amaranthaceae pollen (Liu et al. 2006; Zhang et al. 2010). The discrepancy between these studies may result from high local variability of the HII (Li et al. 2014b).

In our study, cereal-type, Cannabaceae and Brassicaceae are the major pollen components of farmland, while Apiaceae, Elaeagnaceae and Liliaceae are the main components of wasteland. The pollen of cereal-type has been considered a direct indicator of human impacts (Whittington et al. 1991; Li et al. 2013, 2014a, 2015). In addition, pollen of the weed species such as Apiaceae, Elaeagnaceae and Liliaceae is often used as an indicator of human activity (Li et al. 2008; Ma et al. 2008). However, the relatively high weed pollen frequencies in farmland and wasteland may be due to increased humidity caused by irrigation systems. The significant correlation between the HII and these pollen types indicates that changes in these types, in particular cereal-type, exhibit the same trend as the HII values: the relatively pristine vegetation categories (desert, steppe/meadow, forest and river valley) have low HII values and low frequencies of cereal-type, indicating minimal human impacts. In contrast, the human-influenced vegetation categories (farmland and wasteland) have high HII values and high cereal-type pollen frequencies, indicating significant human impacts (Fig. 4). Therefore, human influences almost certainly affect the surface pollen assemblages in the study region. In addition, both cereal-type pollen and the HII values can be used to differentiate between human-influenced and natural vegetation. Thus, our results are a potentially useful reference for pollen-based reconstruction of the history of human activity in the region.

Conclusions

  1. 1.

    The modern pollen assemblages of topsoil samples from different vegetation types along a transect from Lanzhou to Urumqi in northwestern China are representative of the contemporary vegetation. Pollen spectra from farmland are dominated by cereal-type (≥ 15%) and Amaranthaceae (≥ 20%), whereas those of wasteland are dominated by Amaranthaceae (≥ 25%), Artemisia (≥ 20%), Poaceae (≥ 10%), Asteraceae (≥ 5%) and Cyperaceae (≥ 5%). Amaranthaceae (≥ 45%) and Ephedra (≥ 10%) are the most important taxa in desert, and Cyperaceae (≥ 35%) and Thalictrum (≥ 25%) are the dominant taxa in steppe/meadow. The pollen spectra from forest and river valley are characterized by high frequencies of Picea (≥ 10%) and Cyperaceae (≥ 20%).

  2. 2.

    RDA analysis indicates that the pollen assemblages are primarily controlled by mean annual precipitation and vegetation type. Thus, the modern pollen spectra from human-influenced vegetation can potentially be used for quantitative climate reconstructions in arid China.

  3. 3.

    The human influence index (HII) is significantly correlated with cereal-type pollen, and in addition it can differentiate human-influenced and natural vegetation. Human influences in the region have a discernible effect on the surface pollen spectra. Our results have implications for interpreting fossil pollen records in northwestern arid China and similar regions.