Introduction

Bacteria are the most abundant organisms found in soils and support soil functions by nutrient cycling, organic material degradation, and other ecological services [13]. Such features as biodiversity and composition are greatly influenced by the location and physicochemical properties of the soil [4]. Numerous studies have investigated the relationship between bacterial communities and environmental factors [57]. Due to the high complexity and plasticity of bacterial community, different methods used to reveal bacterial diversity, frequency and magnitude of environmental disturbance, and limited knowledge on spatiotemporal scales of bacterial community in ecosystems, previous studies have shown inconsistent or even contradictory results [8]. It is necessary to clarify how environmental factors shape bacterial community, especially the one in contaminated setting.

As the demand of oil increases rapidly and certain toxic chemicals found in crude oil accumulate in ecosystems, oil contamination in soil is becoming one of the most severe global environmental problems [911]. When exogenous chemicals are released into soil, the environment for bacteria will be greatly changed, and the overall bacterial community structure will shift to adapt to the new habitat [12]. Therefore, many studies have been carried out to analyze bacterial communities in oil-contaminated fields [917]. However, most of the efforts to describe bacterial communities in oil-contaminated fields have been focused on phylogenetic composition. Few studies have quantifiably studied both of the influence of natural factors and human factors on its bacterial communities. Better understanding how indigenous bacterial communities are shaped by geographic location, soil physicochemical properties, and oil contamination is vital for soil remediation and the management of oil-contaminated fields.

Hence, this study aims to (1) clarify the bacterial community feature of oil-contaminated soil, (2) investigate the impact of oil contamination on soil physicochemical properties, and (3) quantify the influence of oil field geographic location, oil contamination, and physicochemical properties of soil on bacterial community using pyrosequencing. The understanding of bacterial diversity in oil-contaminated soil and its relationship with natural and anthropic environmental factors would be useful in soil bioremediation in response to oil contamination.

Material and Methods

Research Sites and Soil Sampling

Four representative oil fields that produce approximately 50 % of the total crude oil of China were chosen as the research sites (Fig. S1). The Daqing (DQ) oil field (46° 35′ N, 125° 18′ E), in northeast China, is characterized by a temperate continental monsoon climate with a mean annual rainfall of 428 mm. The Huabei (HB) oil field (39° 10′ N, 116° 20′ E), in the Huabei Plain in northern China, is characterized by a temperate monsoon climate with a mean annual rainfall of 575 mm. The Shengli (SL) oil field (37° 28′ N, 118° 29′ E), in the Yellow River area in northern China, is characterized by a warm temperate continental semi-humid monsoon climate with a mean annual rainfall of 550 mm. The Karamay (XJ) oil field (45° 37′ N, 85° 01′ E), in western China, is characterized by a temperate continental arid climate with a mean annual rainfall of 109 mm. Field research was carried out in September to December 2012. Contaminated soils (DQ, HB, SL, XJ) were collected adjacent to crude oil pumping wells near surface (0–10 cm), while uncontaminated soils (DQ(BLK), HB(BLK), SL(BLK), XJ(BLK)) were collected 100 km from the wells. For each oil field, four subsamples of contaminated soil and four subsamples of uncontaminated soil were collected. Soils were sealed in sterile sampling bags, transported to the laboratory in an icebox, and stored in the dark at −20 °C until processing. Since all those fields are located in plain areas, no obvious soil variation was observed during field work. And further statistic analysis showed that there was no significant difference in soil physicochemical properties of subsamples for each oil field (p > 0.05). Thus, four subsamples for contaminated soil and uncontaminated soil were later respectively pooled prior to experiments to reduce the effects of fine scale spatial heterogeneity on bacterial community [18].

Geographical, Contaminant, and Physicochemical Indices

Latitude and longitude coordinates of all the samples were measured by the Global Positioning System (GPS) during field work. Precipitation data came from the China Meteorological Data Sharing Service System [19]. The concentration of total petroleum hydrocarbons (TPH) was determined using the Ultrasonic-Soxhlet extraction gravimetric method [20]. The four components of TPH (saturates, naphthene aromatics, polar aromatics, and asphaltenes) were separated using the Carhett method according to ASTM D4124 [21]. The following indices of soil physicochemical properties were measured according to the recommended soil testing procedures [22]: soil texture, density, water content (WC), pH, salinity, total nitrogen (TN), olsen-nitrogen (Olsen-N), total phosphorus (TP), olsen-phosphorus (Olsen-P), total potassium (TK), olsen-potassium (Olsen-K), and organic matter (OM) (Table 1).

Table 1 The geographical and physicochemical data of the four oil fields

DNA Extraction and PCR Amplification

Microbial DNA was extracted from eight samples from four research sites using a PowerSoil DNA Isolation kit (MO BIO Laboratories, Inc., West Carlsbad, CA) according to the manufacturers’ protocols. The V1–V3 region of the bacteria 16S ribosomal RNA gene was amplified by polymerase chain reaction (95 °C for 2 min, followed by 25 cycles at 95 °C for 30 s, 55 °C for 30 s, 72 °C for 30 s, and a final extension at 72 °C for 5 min) using primers 27 F (5′-AGAGTTTGATCCTGGCTCAG-3′) and 533R (5′-TTACCGCGGCTGCTGGCAC-3′). PCR reactions were performed in a 20-μL mixture containing 4 μL of 5× FastPfu buffer, 2 μL of 2.5 mM dNTPs, 0.8 μL of each primer (5 μM), 0.4 μL of FastPfu polymerase, and 10 ng of template DNA. The DNAs of SL(BLK) and XJ(BLK) were not detected even after multiplying DNA extractions and pooling in a spin column to concentrate, and increasing PCR cycle. Low abundance of microbial community in these two samples might result from the high salinity of SL(BLK) and XJ(BLK) and the lack of water in XJ(BLK) (Table 1), since high salinities limit the microbial access to hydrocarbon and the availability of oxygen [23].

454 Pyrosequencing

After purification using the AxyPrep DNA Gel Extraction Kit (Axygen Biosciences, Union City, CA, USA) and quantification using QuantiFluor™ -ST (Promega, USA), a mixture of amplicons was used for pyrosequencing on a Roche 454 GS FLX + Titanium platform (Roche 454 Life Sciences, Branford, CT, USA) according to standard protocols [24]. The raw reads were deposited into the NCBI Sequence Read Archive (SRA) database (accession number SRP041085).

Bioinformatic Analysis

Pyrosequencing flowgrams were converted to sequence reads by using Mothur [25]. Trimmed sequences were produced by removing low-quality sequences (quality scores <25, sequence lengths <200 bp) and ambiguous reads (ambiguous base >0) using QIIME v. 1.3.0 [26]. Sequences were aligned against the Silva database (SSU111 version: http://www.arb-silva.de/) using k-mer searching (http://www.mothur.org/wiki/Align.seqs) [27]. Potential chimeric sequences were detected using UCHIME (http://drive5.com/uchime) and removed [28]. The high-quality sequences were pre-clustered (http://www.mothur.org/wiki/Pre.cluster) and then clustered using an uncorrected pairwise algorithm. Operational taxonomic units (OTUs) were defined as sharing >97 % sequence identity using the average neighbor method (http://www.mothur.org/wiki/Cluster).

Statistical Analysis

Univariate analysis of variance (UNIANOVA) testing contaminant and physicochemical index difference and Pearson’s correlation analysis were performed by SPSS version 17 (SPSS, Inc., Chicago, IL, USA). The contributions of geographic location (G), oil contamination (C), and physicochemical properties (P) to bacterial community variation and the contributions of geographic location (G) and oil contamination (C) to soil physicochemical property variation were evaluated with variance partitioning analysis using (partial) canonical correspondence analysis (CCA) and redundancy analysis (RDA), respectively, by CANOCO for Windows Version 4.5 [29] (Plant Research International, Wageningen, The Netherlands). The significance test was carried out by Monte Carlo permutation (999 times). Weighted principal component analysis (PCA) was generated by using CANOCO for Windows Version 4.5 to demonstrate the clustering of different sites. Pairwise similarities between individual sites were calculated based on the patterns of OTUs0.03 using Jaccard’s similarity index (Cjaccard = SAB / (SA + SB − SAB), where SAB is the number common OTUs0.03 in sites A and B, SA is the number of OTUs0.03 in the sample A, and SB is the number of OTUs0.03 in the sample B [30]. Good’s coverage index, abundance-based coverage estimator (ACE), Chao1 richness estimator, and Shannon-Wiener diversity index were calculated by Mothur 1.15.0 [25]. Venn diagrams and Heatmap figure were implemented by the R (http://www.R-project.org) packages VennDiagram [31] and pheatmap [32], respectively.

Results

Basic Characteristic of Geographic Location and Physicochemical Property

The geographical and physicochemical data was listed in Table 1. The latitude of all the samples were approximately 40°, while longitude varied from 84° to 125°. XJ had lowest precipitation (109 mm/a), while HB had the highest (575 mm/a). The TPH concentration varied substantially among the contaminated sites from 25,560 to 183,760 mg/kg, which far exceeds the threshold of TPH concentration in environment (200 mg/kg) set by Organization for Economic Cooperation and Development (OECD). The TPH concentration of uncontaminated soils was 0.00 mg/kg, confirming that they were “uncontaminated”. UNIANOVA showed that the TPH concentrations of the four oil fields were significantly different (p < 0.01), and the concentrations of TPH’s four main components (saturates, naphthene aromatics, polar aromatics, and asphaltenes) were significantly different (p < 0.01). Saturates were the most dominant component. Other tested soil parameters (soil texture, density, WC, pH, salinity, TN, Olsen-N, TP, Olsen-P, TK, Olsen-K, and OM) were different between contaminated and uncontaminated soil. Physicochemical properties of the uncontaminated soil in four oil fields were not significantly different (p > 0.05).

Diversity of Bacterial Community

There were 125,654 (70.1 %) high-quality sequences that could be grouped into 4679 OTUs0.03. Different numbers of OTUs0.03 were assigned in different oil fields (Table 2). Good’s coverage ranged from 89.8 to 99.2 %, except for the bacterial 16S ribosomal RNA (rRNA) library from DQ(BLK) (79.3 %), indicating that most of the libraries could well reflect the majority of the bacterial community. According to the Shannon-Wiener index values, the highest level of biodiversity was found in the XJ (Table 2). Each oil field possessed a bacterial community feature that differed significantly from others based on the OTUs0.03 abundance (p < 0.01). Venn diagrams and PCA were used to visualize the overall spatial heterogeneity in bacterial community patterns. The Venn diagram showed that only 229 OTUs0.03 out of 4355 were shared by all sites. DQ had the largest number of shared communities (72.14 %), while SL had the largest number of unique OTUs0.03 (49.17 %) (Fig. S2). The shared portion between HB and SL was the lowest. In the PCA plot, only DQ and HB clustered close, while the other two showed a clear separation, with one quadrant composed entirely of one site, indicating that the bacterial communities from SL and XJ were quite different from the other three (Fig. 1).

Table 2 Estimation of the libraries based on 16S rRNA gene pyrosequencing
Fig. 1
figure 1

Bacterial OTUs0.03 PCA of contaminated and uncontaminated soil. PC Axis 1 explained 50.5 % of the variation; PC Axis 2 explained 38.5 % of the variation

Taxonomic Composition of the Bacterial Community

In addition to OTU-based analysis, the individual sequences were assigned to phylotypes and were grouped into 35 phyla, 92 classes, and 606 genera in oil-contaminated soil. At the phylum level, soil was dominated by Proteobacteria (38.33 %), Actinobacteria (23.30 %), Bacteroidetes (11.13 %), and Firmicutes (5.33 %). Unequal distribution among the sites was detected (Fig. 2). In HB, the largest dominant phylum was Actinobacteria, at approximately 51 % of all sequences, instead of Proteobacteria, as in other sites. The percentage of Bacteroidetes was relatively low in HB (2.88 %) and XJ (5.89 %), while it reached 24.29 % in SL.

Fig. 2
figure 2

Phylum compositions of contaminated soil in research sites. Sequences that could not be classified into any known groups were assigned to “unclassified.” Sequences that could be classified into known groups but had a relative abundance <0.5 % in all the four research sites were assigned to “others”

At the class level, the first three largest dominant bacteria were Gammaproteobacteria (22.21 %), Actinobacteria (17.69 %), and Alphaproteobacteria (11.88 %). A hierarchal clustering double dendogram was performed based upon the relative percentage of the top 40 abundant bacteria at the class level (Fig. S3). Gammaproteobacteria, Actinobacteria, and Alphaproteobacteria were dominant in DQ and HB. Bacilli was far more abundant (13.79 %) in XJ than in other sites, so was Cytophagia in SL (19.74 %). A prevalent group in three of the sites, Actinobacteria was to be found rare in SL (2.88 %). Many of the classes were present in low proportions in all sites.

At the genera level, the 20 most prevalent bacteria were counted, of which Streptomyces, belonging to the Actinobacteria phylum, was the largest dominant genus (Table S1). The top 20 genera in each site were quite different (Table 3). The maximum number of genera (407) was detected in XJ, with Marinobacter (6.22 %), Halomonas (6.04 %), and Alcanivorax (4.84 %) as the three most prevalent genera. In particular, Halothiobacillus (1.01 %) and Sediminibacter (0.94 %) were the two unique but abundant genera in SL, while Haloactinosporathe (1.07 %) was only found in XJ.

Table 3 Top 20 abundant genera of contaminated soil in research sites

Functional Group Distribution

In this study, bacteria known to perform a particular function and appeared in our study were identified according to the literature and were summarized in Table S2 at a genus level to analyze their distinct distribution in the oil-contaminated soil [17, 3348]. The largest percentage of functional genera related to nitrogen metabolism was detected in DQ, while HB possessed the largest proportion of bacteria able to utilize oil as a carbon source, followed by XJ (Fig. 3). XJ also contained the largest percentage of phosphorus metabolism bacterium, Gemmatimonadetes. The sulfur and sulfate-reducing bacterium Desulfuromonas was found to be prevalent in SL. Significant numbers of the hydrocarbon-degrading bacterium Mycobacterium and the saturate-degrading bacterium Alcanivorax were detected in HB and SL, respectively. HB had no sulfur and sulfate-reducing bacterium (Desulfuromonas) being detected, while there was no polyphosphate-accumulating bacterium (Gemmatimonas) found in XJ.

Fig. 3
figure 3

Functional groups distribution found in contaminated soil in research sites

Influence Factor Analysis of Geographic Location and Oil Contamination on Physicochemical Properties

As the research sites shared similar latitudes, longitude converted into projected coordinates was selected as geographic location indices. Alternative indices of contamination factor were the concentration of TPH and the percentages of its four components. To avoid the arching effect in RDA and CCA, Pearson’s correlation analyses were conducted among alternative indices of contamination factors. The results showed that TPH, saturates, naphthene aromatics, polar aromatics, and asphaltenes were mutually significantly correlated (p < 0.05) (Table S3). TPH, as the quantum of the four components, was thus selected as the contamination factor index.

Variance partitioning analysis was performed to determine the quantifiable contributions of the geographic location factor (G) and the contamination factor (C) to physicochemical property variation. The result showed that the geographic location contributed the most (28.40 %) to physicochemical properties (Fig. 4a). The interaction between geographical location and oil contamination could be ignored, and 65.80 % of the physicochemical properties could not be explained by these two components. Oil contamination had less individual impact (5.70 %) on physicochemical properties. This 5.70 % contribution led to changes in physicochemical properties under contamination interference.

Fig. 4
figure 4

Influence factor contribution of a the physicochemical properties and b the bacterial constitution of oil-contaminated soil. G geographic location factor, C contamination factor, P physicochemical properties factor

Comparing the physicochemical properties of contaminated soil and uncontaminated soil, regular changes were found in density, WC, TN, TK, and OM. The density, WC, and TK obviously decreased, while the TN and OM obviously increased (Table 1). Oil contamination transformed most soil texture from loam to sandy loam. In addition, Pearson’s correlation analysis was also carried out between the soil physicochemical index and the oil contamination index to clarify the contribution of oil contamination to specific changes in physicochemical properties (Table S4). The result showed that OM significantly positively correlated to oil pollutant contents (r polar aromatics = 0.589, p < 0.01; r saturates = 0.495, p < 0.01). WC and pollutants had a significantly negative correlation, among which WC and saturates had the highest correlation coefficient (r saturates = 0.621, p < 0.01). In the PCA plot, samples of uncontaminated soil clustered much closer, while samples of contaminated soil had a much dispersed distribution (Fig. S4). HB had the farthest distance between its contaminated soil and uncontaminated soil. Therefore, oil contamination had changed the soil physicochemical properties and enhanced its heterogeneity.

Influence Factor Analysis of Geographic Location, Oil Contamination, and Physicochemical Properties on Bacterial Communities

Pearson’s correlation analysis was also conducted among physicochemical factors (WC, density, pH, salinity, OM, TN, TP, TK, Olsen-N, Olsen-P, and Olsen-K). The results showed that TN and Olsen-N were significantly correlated (r = 0.891, p < 0.01), as were Olsen-P and Olsen-K (r = 0.521, p < 0.01) (Table S5). TN, TP, and TK were excluded as influence factor indices because of their low interactions with microbes, while Olsen-N, Olsen-P, and Olsen-K were retained for their different roles in microbial metabolism. Thus, WC, density, pH, salinity, OM, Olsen-N, Olsen-P, and Olsen K were selected as the physicochemical factor indices.

Along with the geographic location and contamination factor indices, variance partitioning analysis was further performed to quantify the contributions of the geographic location factor (G), the contamination factor (C), and the physicochemical factor (P) to the bacterial community variation represented by OTU0.03. The total variation was partitioned into the independent effects of G, C, and P, interactions between any two components (G-C, C-P, and G-P), common interactions of all three components (G-C-P), and any remaining unexplained data (Fig. 4b). A total of 49.28 % of the variation was significantly explained (p < 0.01) by these three components. Geographical location, oil contamination, and physicochemical properties could independently explain 2.65, 2.55, and 29.21 % of the total variation, respectively. The interactions between geographical location and physicochemical properties (11.22 %) had greater influence compared to the individual influence of the contamination factor (2.55 %). Besides, oil contamination altered soil physicochemical properties and soil physicochemical properties could contribute 29.21 % to the influence of bacterial community distribution. Thus, oil contamination had more intensive indirect impact on bacterial community compared to direct impact.

Significant difference was found between OTUs0.03 in contaminated samples and uncontaminated samples (p < 0.01). PCA was also carried out to visualize the differences in community structure. In the plot, DQ(BLK) and HB(BLK) clustered much closer compared to DQ and HB, showing that oil contamination altered the distribution pattern of the bacterial community and might strengthen its heterogeneity (Fig. 1). This was also confirmed by the Jaccard similarity index, with DQ and HB populations exhibiting a lower index (0.268) than that between DQ(BLK) and HB(BLK) (0.332) (Table S6). All these analyses suggested that oil contamination changed bacterial community features directly or through changing physicochemical properties of the soil and further strengthened the differences among the four oil fields, which were located in different geographic sites.

Discussion

Significant difference of physicochemical property of uncontaminated soil in four sites was not observed (p > 0.05). Oil contamination shifted the overall pattern of the soil physicochemical properties (Fig. S4), primarily causing a significant change in texture, a decrease in the water content, and an increase in the organic matter. Pignatello (1996) thought the sand particles might increase after the entrance of oil in soil, which resulted in the change of soil texture. This phenomenon might caused by the absorption of hydrophobic compounds by humic-coated clay, for the high viscosity of oil, and the change of soil sedimentation rate (Zhou et al. 1995; Pignatello and Xing 1996) [49, 50]. The significant decline in the water content may result from the strong hydrophobic characteristic of oil [12] and an increase in the water evaporation caused by the higher soil temperature after oil contamination [16]. Meanwhile, oil contamination could bring about a decrease in soil density, which is consistent with the result of Khamehchiyan’s study [12].

According to the results, we concluded that physicochemical properties were the most dominant factor influencing bacterial community distribution, followed by geographical location. Oil contamination had a stronger indirect influence and weaker direct influence on bacterial community, which significantly changed its distribution (p < 0.01, Fig. 1, Table S6). In the study of Liang et al., geographic location played a more essential role than the physicochemical environment to the microbe community [16]. As our study selected more integrated physicochemical parameters and added a specific analysis of the impact of geographic location and oil contamination on the physicochemical properties, the question of the bacterial community influence factor was better addressed. However, approximately 50.72 % of the bacterial composition still could not be explained by the geographic location, contaminant, and physicochemical indices, which was consistent with other studies [51, 52]. This might result from the incompleteness of the index setting. For instance, natural factors such as climate, temperature differences, and daylight duration would affect microbial metabolism [16]; the duration of contamination in the research site and the sampling process would also interfere with the microbial community [52, 53]; and animals and plants had an intimate relationship with microbes [52, 54, 55].

Geographical location, oil contamination, and physicochemical properties shaped the bacterial communities distribution and different environmental settings presented distinguished patterns. Proteobacteria, Actinobacteria, Bacteroidetes, and Acidobacteria, which play an important role in carbon, nitrogen and other nutrition cycling, were found to be prevalent in contaminated sites, which was in line with other research results [56, 57]. Many genera which play a major role in the fate of pollutants and the cycling of nutrients were detected in this study [58, 59] (Table S2). Although the distribution of functional groups had been described, whether these represented taxa were also functionally active in this study is still unknown. As stated, the physicochemical properties had the strongest influence on the uniqueness of the bacterial community composition and diversity. Daqing oil field had a large proportion of functional groups, especially those that take part in nitrogen turnover. This might be closely associated with the high nitrogen content of the soil in Daqing oil field (Table 1). A diversified bacterial community was found in the oil-contaminated soil of the Huabei oil field, particularly hydrocarbon-degrading bacteria, such as Mycobacterium. The highest TPH concentration in the Huabei oil field might be the cause of this site containing the highest proportion of hydrocarbon-degrading bacteria compared to other functional groups and sites (Table 1). Unlike other oil fields, the most dominant phylum in the Huabei oil field was Actinobacteria, which is an aerobic community that develops in soil with enough pores and gas, as well as effective gas exchange with the atmosphere [60]. The Huabei oil field is located in the Huabei plain, one of the grain-producing areas in China, and the agriculture soil provides an optimal habitat with loam for aerobic bacteria compared with other oil fields (Table 1). Thus, Actinobacteria predominated in the bacterial community. The Shengli oil field had Proteobacteria as the largest dominant phylum and possessed the largest proportion (49.17 %) of local genera such as RF3, and Actinobacteria, which were prevalent in other oil fields, was rarely detected in this oil field, which might be explained by its more severe soil salinization and alkalization (Table 1). The Karamay oil field is well known for its location in the Gobi desert, which experiences very low annual precipitation. The special geographic characteristics of the Karamay oil field might result in its unique bacterial community characteristics. The Karamay oil field had the most diversified soil bacterial community and a notable proportion of functional groups that are especially involved in oil utilization, indicating that in the barren Gobi desert, oil as a carbon source facilitated the development of hydrocarbon-degrading group. The largest proportion of bacteria involved in phosphorus turnover might be explained by its highest phosphorus content (Table 1). Actinomycetales, which are known to dominate alkaline and particularly arid soil, were only found in the Karamay oil field, and Bacilli, which adapt to arid environments, were far more abundant there (13.79 %) [61, 62].

This study allowed us to shed light on the diversity and composition of bacterial communities thriving in oil-contaminated soil environments, where geographic location, oil contamination, and physicochemical properties lead to their significant differences among disparate oil fields across China.